< 

> 

STATISTICS IN BIOLOGY AND PSYCHOLOGY 












Scanned by CamScanner 












SOME RELATED BOOKS 


Debojyoli Das 

Biochemistry 

Biophysics and Biophysical Chemistry 


P. K. Giri 
Jiban Banerjee 

Statistical Tools and Techniques 


Saha & Paul 

Biostatistics 


Pallab Basu 

Biochemistry Laboratory Manual 


Prakash Chandra Dhara 

Computer in Biological Science 


Scanned by CamScanner 



STATISTICS IN BIOLOGY 
AND PSYCHOLOGY 


debajyoti DAS 

Reader in Physiology (retired) 

Presidency College, Kolkata 
Guest-Faculty at Bose Institute, Kolkata 
Former Guest Lecturer in Physiology 
University of Calcutta 

Former Guest Teacher in Human Physiology, 
Microbiology, Zoology and Aquaculture 
Vidyasagar University 


A RATI DAS 

Reader in Psychology (retired) 

Bethune College, Kolkata 

Former Guest Teacher in Human Physiology 

Vidyasagar University 

Former Counsellor in Psychology (Nursing) 
Indira Gandhi National Open University 


SIXTH EDITION 



ACADEMIC PUBLISHERS 

KOLKATA 700 073 

E-mail : contact® academicpublishers.in 
Website : www.acadcmicpublishers.in 


Scanned by CamScanner 



© Copyright reserved by the authors 


First published August 1981 
Second edition November 1993' 
Third edition November 1998 
Fourth edition July 2003 

Second print January 2004 
Third print July 2005 
Fifth edition April 2008 
Sixth edition June 2010 

Second print January 2012 
Third print June 2013 
Fourth print August 2014 
Fifth print August 2015 


ISBN 978-93-S0599-04-5 






Price rupees three hundred only. 




Published by Bimal Dhur of Academic Publishers, 5 A Bhawani Dutta Lane. Kolkat 7rw> 
by Saugata Ghosh of White Hall Computers. 13 IB. A. J. C. Bose Road, Kolkata-TOOf 3 IaSCrlypc 
Dipankar Dhar at Rajendra Offset and Graphics. 11 Panchanan Ghosh L^ie Kolk ° 14, 3nd pr,nta 

^ /WvUv7 # 


Scanned by CamScanner 






To 

Dr. ASOKE GOPAL DATTA 
my teacher 


Scanned by CamScanner 



ACKNOWLEDGEMENTS 


ackiowled7e<ftare° US CO ° pCra ' ion of ll,e following persons and organizations are 

o?,ZTH°s 0 T^U. 7°* fo ' '"t PCrmiSSi °" repri "‘ - -produce 
.964, »*„en by *"**"' <**“« ^n. 

Long m ?„ e G^rUdT„n 0 do 0 „ f t their T *?"•“ ^ ^ * Fra " k Yat «- and 

book 

s od K ce f d . h fr T a "° ther ° f 

m Psychology and Education (6th edition 1978) by J P GuilfordT d^ Funda, * ental Statisti cs 




Scanned by CamScanner 







PREFACE 


For three decades now, this book has been proffering the fundamental principles and 
applications of statistics in the higher studies and scientific investigations of Life Sciences, 
Psychology and Education. The authors deem this occasion opportune to pay their humble 
homage to their mentor, the late Rajcndra Nath Dhur, but for whose encouragement the first 
edition of this book would not have seen the light of the day. 

The authors persist in the present edition in describing the use of statistics for determining the 
sample size, drawing samples, based on laws of probability, presenting the experimental data in 
tabulated, graphical mid pictorial forms, elucidating the properties of normal, Student’s /, binomial 
and Poisson probability distributions, working out the reliability and validity of tests, testing the 
significance of the observed results from the probability of correctness of the null hypothesis, es¬ 
timating errors of inference, analyzing experimental data with numerous parametric and nonpara- 
metric tests such as r tests, Wilcoxon’s tests, Mann-Whitney test, analysis of variance and 
Kruskal-Wallis test, using chi square test and G test to find the goodness of fit between the ob¬ 
served distribution and a chosen probability distribution, assessing the linear correlations between 
variables by Pearson’s, Spearman’s and Kendall’s tests, predicting the probable value of a variable 
from the values of other variables correlated with the latter, working out item analysis and factor 
analysis for a psychological test, and standardizing the latter. Mathematical complexities have 
been confined to what are needed in the current undergraduate and postgraduate syllabi and the 
subsequent applications of statistics to the experiments in the relevant disciplines, but without any 
sacrifice of lucidity, clarity, precision and essential methodology. For every statistical method, 
problems in the form of worked-out sums have been cited as Examples from various relevant dis¬ 
ciplines. Different procedures of experimental designs have been systematically elucidated. Basic 
steps in working out and interpreting several useful statistical tests have been summarized in a 
chapter in the form of flowcharts for ready reference of the learner. Exhaustive glossaries are 
gi\en after each chapter for easy access to basic terms and concepts. Numerous sample questions 
and statistical sums have also been annexed towards the end of this book. 

The authors would deem their endeavours rewarded if die intended readers—students of under¬ 
graduate, postgraduate and pre-doctoral levels as well as researchers of die relevant disciplines— 
find this edition more helpful to them. 

The authors express their gratitude to Mr. Bimal Kumar Dhur and Mr. Dipankar Dhar for their 
sincere unhesitant support and unbounded cooperation iq publishing this edition. 

The authors also convey their appreciation to Mr. Saugata Ghosh of M/s. White Hall Comput¬ 
ers and Mr. Biswajit Seal of M/s. Academic Publishers for their roles in the publication work. 
Kolkata, 

Bengali New Year Day, 

15 April 2010. 


Debajyoti Das 
Arati Das 


(ix) 


Scanned by CamScanner 


thanksgiving 


—* ~«es for the encoura 

Prof. H. Y. Mohan D— r-___ , . . l 8 ern e nt 


Prof. Tushar Kanti Ghosh', pT^U£ ° f Delhi > ** Ananga Moh 

Dasgupta of the University of Calcutta • Pr f D nC i rJee ’ Prof- Pritha Mukheriee mn ^ Chan di 
Maity of Vidyasagar University • Hr r ° Prakash Dhara , Prof. Debidas Ghn h Dr ' Sad h; 
Dr. Subram Ghosh of S7nc'vc ol ?* U,am J >al <* Kalyani Univershy Dr n “" d Dr 4 
Bindu Banerji of Uttarpara r! p^J'Chaudhury of Chy Co^ '^" • 
Pratim Chakraborty ot Raia N t vk M .° ha " College ; Dr Dilip Kuma^NanH ^ Dr ' »«t 
College; Dr. Asima Das and Dr Sha^d" M t"’ a Mahavid yalaya ; Dr. ShyamaliV^ ° r Panl 
of Surendranath College • Dr sib e ' ya Ray of Serampore College • Dr Bh» ° f Belln » 
Ms. Sumita Mukherh ofOokh ^ Se " ° f Ram ">ohun College • D r Lo " m Mukh ")' 
Chattetji of BiJaykX G ^C^“ C °' kge ‘ « 

f o “ n Sa and P -f- Tanya D/s of 7hf'TsT ° f H °° ghIy Mohsin CoU^ 



Scanned by CamScanner 



CONTENTS 


1. Statistics, variables and samples 1-14 

Applications of statistics 1 ; Variables 2 ; Population and sample 6 ; Sampling methods 7 ; 
Parameter and Statistic 11 ; Glossary 12. 

2. Presentation of data 15-40 

Frequency distributions 15 ; Pie diagram 20 ; Bar diagram 20 ; Frequency polygon 22 ; 
Histogram 26 ; Frequency distribution curve 30 ; Ogives 30 ; Scattergram 33 ; Linear graph 
34 ; Exponential curve 37 ; Semi-logarithmic graph 38 ; Glossary 39. 

3. Statistics of location 41-55 

Classification 41 ; Mean 41 ; Geometric mean 45 ; Quantiles or fractiles 46 ; Percentile ranks 
48 ; Median 49 ; Mode 53 ; Glossary 55. 

4. Statistics of dispersion 56-72 

Measures of dispersion 56 ; Range 56 ; Mean deviation 57 ; Standard deviation 57 ; Variance 
64 ; Central moments 67 ; Quartile deviation 67 ; Coefficient of variation 70 ; Coefficient of 
quartile deviation 71 ; Coefficient of dispersion 71 ; Glossary 72. 

5. Sampling statistics 73-83 

Sampling errors 73 ; Sampling distributions 73 ; Standard errors 74 ; Standard scores 80 ; 
Degreesjof freedom 82 ; Glossary 82. 

6. Probability distributions 84-109 

v Probability 84 ; Probability distributions 85 ; Normal distribution 86 ; Best-fitting normal 
distribution 89 ; Skeytfess 92 ; Kurtcisi's 94 ; Student’s t distribution 95 ; Confidence intervals 
97 ; Binomial distribution 100 ; Poisson distribution 105 ; Glossary 108. 

7. Testing of hypothesis 110-143 

Difference between means 110 ; Null h yR otJiesis 111 ; Levels ^.significance 112 ; Errors of 
inference 114 ; One-tail and-two-tail tests 115 ; Significance of difference between means 
using z scores 117 ; t tests for significance of difference between means 123 ; Glossary 143. 

8. Correlation and regression 144-207 

Correlation 144; Product-moment correlation 145 ; Partial correlation 162; Multiple correlation 
164 ; Spearman § rank correlation 167 ; Kendall’s rank correlation 172 ; Point bfserial r 179 • 
BiseriaJ^T83 ; Yule’s phi coe^ielent 187 ; Tetradmritf'r 189 ; Contingency coefficient 191 ■ 
Regression 193 ; Multiple regression 203 ; Glossary 206. 


(xi) 


Scanned by CamScanner 


9. Nonpiiniinolric sliilistir.s 208-246 

Nonporametric NMtl.slicfl 208 ; Chi square tests 208 ; O test 222 ; Wilcoxon sign 1 

227 ; Wilcoxon composite rank test or rank sum test 231 ; Mann-Whitnev U icu rxn rank tesi 

test 242; Glossary 245. * ’ ^sdian 

10. Psychological test construction 247-280 

Psychological variables 247 ; Principles of test construction 249 ; Power and * 

Item analysis 250 ; Reliability of a lest 257 ; Validity of a test 263 • Exocct mr v > T 250 ’ 
Factor analysis 268 ; Standardization and norms 270 ; Glossary 278. y e 268 ; 

11. Analysis of variance 281-328 

Experimental design 281 ; Analysis of variance 288 ; One-way anova 291 Mnltinl. „ 
tests .too ; Krttskal-Wallis nonporametric anova 310 ; Two-way anova 315 ; Glossary?:^ 0 " 

Steps in statistical tests 329-341 

A significance test 329;, test for (T, - * 2 ) of iarge independent groups 330; r test for (7 T 

of sma" .-.dependent groups 331 ;, test for (7, - X 2 ) of small single group 332; Using ^ 

for (X, x 2 ) of large tndependent groups 333 ; Mann-Whitney U test for (X - Y ) lie 
Chi square test for goodness of fit 335 • Product i . U l A 2> 334 ; 

336 ; Product-moment cotrelation using raw seo^337 Stable f 0 " ^ ° f producB 

using sum of products and sum of squares 338 * Simnlc r P ' me ? r regression of ^ on X 
scores 339 • i w ,r qUare ? 338 * Slm P le llnea r regression of Y on X using 

ores 339 , Kruskal-Wallts nonparametric anova 340 ; One-way anova 341. g 

Bibliography 342-343 
Sample questions 344-373 
Appendix : Tables 374-384 
Index 385-387 


raw 




Scanned by CamScanner 







1. STATISTICS, VARIABLES AND SAMPLES 


Each experiment or test explores a properly or 
event which varies from individual to individual 
in a group, or from time to time in the same 
individual. Dependable inferences can seldom be 
drawn by a mere straightway inspection of the 
experimental observations. So, the experimentally 
obtained data need statistical treatment for proper 
inferences. Statistics is also used in the scientific 
designing of experiments and tests. 

1.1 APPLICATIONS OF STATISTICS 

Statistics is the science of the methodology for 
the scientific collection, systematic presentation, 
mathematical analysis and interpretation of the 
data, and for drawing inferences about the 
explored property or phenomenon in the relevant 
population. In this respect, statistics has the 
following basic applications. 

(a) Statistics is used in designing experiments 
scientifically and minimizing experimental errors. 

( b) It is used in drawing a representative 
sample from the population for an experiment or 
survey, and in fixing the sample size for 
dependable inferences. 

(c) Statistics is used for the systematic 
arrangement of the observed data to express them 
in common communicable forms. It thus provides 
for a meaningful description or presentation of 
the studied property or phenomenon. 

(d) There is always a probability that the 
observed results of an experiment, using a sample 
instead of the entire population, occurred by 
chance in the particular sample and would not 
occur if the entire population were studied. 
Statistics is used in assessing this probability of 
chance occurrence of the observed results — the 
inference is then drawn according us this 


probability is too low or too high. Thus, statistics 
helps to generalize the findings of a sample to the 
entire population. 

(e) Statistics also estimates the probability of 
errors in the inference. 

(/) Statistics is used in estimating the reliability 
of an experimental method or test in measuring 
the specific relevant property, through a 
mathematical assessment of the consistency of 
results on repeating the test on the same sample. 

(g) It is also used in estimating the validity of 
a test, i.e., the capacity of that test to measure the 
intended property in exclusion of other closely 
related ones. 

(h) Statistics explores and quantifies the 
magnitude and direction of relation between the 
changes of two or more properties, events or 
phenomena, as well as their cause and effect 
relations. 

(i) It may be applied to predict mathematically 
the most probable value of a property or event in 
an individual or case, from the observed value(s) 
of one or more of other related properties or events 
in that individual or case. 

(/') The causative factors for the variations of a 
property or event are explored by the statistical 
method of analysis of variances. 

In such various ways, statistics is used to study 
numerous aspects of human problems like 
individual differences in physicochemical 
properties and psychological characteristics, 
educational, epidemiological, sephological, 
ergonomic, industrial and population problems, 
pollution hazards, market and employment 
surveys, medical therapy and scientific research. 


1 


1 


Scanned by CamScanner 


2 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


1.2 VARIABLES 

A variable is any property, phenomenon or 
event that varies in quantity or quality either 
spatially , i.e., from case to case, individual to 
individual, place to place, organ to organ, and cell 
to cell even at the same instant, or temporally , i.e., 
from time to time even in the same case, in the 
same individual, in the same tissue, in the same 
cell or at the same location. Almost all properties 
and constituents of a living being are variables ; 
statistics is mostly concerned with the study of 
such variables. For example, even when measured 
at the same time, the systolic arterial pressure may 
differ in different arteries of a person; the trunk 
length, wing length or tracheal ventilation volume 
differs from insect to insect of the same species ; 
the interorbital width differs from bird to bird of 
a species ; the cell count differs from culture to 
culture of the same micro-organism; the blood 
sugar, height, weight, blood volume, surface area, 
vital capacity, skinfold thickness, urine volume, 
oral temperature, memory, learning ability, 
intelligence, job performance, lens power and 0 2 
consumption differ from human to human even 
at the same moment ; the animals of a species 
differ in sex, age, phenotype, genotype, litter size, 
femur length, coat color and ferocity; atmospheric 
P 02 differs at different altitudes at the same 
moment while the sea water temperature differs 
at the same time from place to place. All such are 
spatial variations. On the contrary, the blood 
sugar, oral temperature, blood pressure, urinary 
NPN concentration, blood volume, sweating 
rate, height, body weight, body surface area, 
alertness, motivation level, O 2 consumption and 
urine volume rnay vary in the same person at 
different times ; the trunk length, wing length 
or tracheal ventilation changes in the same 
insect with age ; the femur length, interorbital 
width and coat color change in the same animal 
as it grows. All such are examples of temporal 
variations. 


Classes of variables 

1 . Measurement variables : 

Magnitudes or amounts of these variables c a 
be measured or counted, and expressed in 
numerical values. Examples : age, height, weigh" 
litter size, femur length, wing length, interorbital 
width, heart rate, respiratory rate, blood volume 
blood pressure, hemoglobin concentration, blood 
sugar, urinary NPN, bacterial or blood cell counts, 
reaction time, nerve impulse velocity, body 
temperature, vital capacity, body surface area 
respiratory C0 2 output, and cell size. Numerical 
values of a measurement variable are called its 
scores. The scores of a measurement variable show 
clearly : (/) whether two such scores are different 
from or equal to one another, («) whether one of 
them is higher or lower than the other, and (///) 
by how much one score exceeds or falls short of 
the other. For example, if two persons have pulse 
rates of 100 and 72 per minute respectively, their 
pulse rates differ, the former being higher than 
the latter by as much as 28 beats per minute. 
Measurement variables belong to the following 
two sub-classes. 

(a) Continuous variables : These can be 
measured and expressed not only in whole units, 
but also in any infinitely small fraction of such 
units. So, continuous variables can have an infinite 
number of scores even between two whole units 
— their scores can increase or decrease by 
infinitely small fractions of whole units, with no 
limit to the smallness of those fractions. So, the 
possible scores of such a variable form a 
continuous series with no intervening gaps. For 
example, the height of a human or the wing length 
of an insect can be measured upto the level of 
centimetres, millimetres, nanometres and still 
lower, depending upon the degree of fineness of 
the measuring instrument and the precision of 
measurement. Thus, a continuous variable should 
yield continuous metric data with no real gaps ; 
however, such data from an actual study show 
apparent gaps owing to (/) limitations of fineness 


Scanned by CamScanner 








STATISTICS, VARIABLES AND SAMPLES 


3 


and precision of measurement, and (i7) accidental 
absence of individuals having (he missing 
intervening scores in the sample. Examples: age, 
height, weight, surface area, blood or urine 
volume, blood sugar, serum cholesterol, body 
temperature, vital capacity, pH, impulse velocity, 
reaction time, cell size, blood pressure, alveolar 
Po 2 , otganclle dimensions, microbial fermentation 
rates, and the scale scores of intelligence, anxiety, 
interest, aptitude, achievement, etc. 

(b) Discontinuous, discrete or meristic 
variables : These can have only certain specific 
values, and no intermediate value between those 
fixed ones. Their scores can be only in whole units 
in most cases, and upto certain fixed fractions of 
the whole unit in other cases, because of the 
impossibility or impracticability of further 
subdivision of their measures. So, their scores may 
increase or decrease by that fixed amount only, 
and not by any smaller fraction of the unit. Their 
scores thus form discontinuous data with real gaps 
between consecutive scores. Examples include the 
enumeration data from the counting of individuals 
or cases, such as litter size, family size, number 
of animal digits, heart rate, respiratory rate and 
impulse frequency — such counts are limited to 
whole numbers of cases and cannot be reasonably 
extended to fractions of cells, digits, litter mates, 
respiratory cycles, etc. Heart rate, however, is 
often treated as a continuous variable during 
statistical analysis. 

There are two types of scales for measurement 
variables, (i) Interval scale : Some measurement 
variables such as temperature, intelligence, 
memory and learning ability cannot be absolutely 
absent - with a zero value - anywhere or in any 
case; so, the scale used for such a variable has no 
true zero point — each such interval scale may 
start from any conveniently chosen arbitrary zero 
point ; e.g., Celsius, fahrenheit and kelvin scales 
for temperature, having different arbitrary zero 
points. In the absence of a true zero point, two 
scores in the interval scale cannot be multiplied 
or divided by each other and a ratio of the two 


cannot be directly worked out. (//) Hallo scale : 
Many measurement variables such as height or 
length, mass, weight, volume, pitch and loudness 
can be totally absent so as to have a zero value in 
some cases or at some places ; so, the scale for 
such a variable has a true zjero point — all such 
ratio scales for a particular variable (e.g., both 
centimetre and inch scales for length) start from 
the same true zero point. The existence of a true 
zero point enables the scores in a ratio scale to be 
multiplied or divided by each other and to be used 
for working out their ratios. Ratio scales are far 
more extensively used in life sciences'than in 
psychology ; however, ratios between stimuli in 
psychophysics, calorie expenditures in various job 
activities, and times taken in assembly tests in 
industrial psychology are worked out using ratio 
scales. 

2. Ordinal or ranked variables : 

These consist of variables like attentiveness, 
leadership quality, social adjustment, 
trustworthiness, ferocity and docility, which vary 
distinctly in magnitude or intensity among 
members of a population, but their magnitudes 
cannot be measured quantitatively. Individuals of 
a sample may be graded into ranks in ascending 
or descending orders according to the relative 
magnitudes of the relevant variable in them ; but 
these ranks give neither any absolute quantitative 
measure of the variable in an individual nor the 
magnitude of differences between two individuals. 
Thus, the ranks of an ordinal variable show (i) 
whether two individuals are differentfrom or equal 
to one another and (i7) whether one is higher or 
lower than the other with respect to that variable, 
but (//*) cannot indicate by how much one is higher 
or lower than the other, and (i’y) gives no indication 
about whether the difference in magnitude of the 
variable between any two consecutive ranks 
equals, exceeds or falls short of the difference 
between two other consecutive ranks. Thus, the 
difference in ferocity between animals ranked 1 
and 2 may very well be cither equal to, or higher 
than or lower than the difference between those 


Scanned by CamScanner 




STATISTICS IN BIOLOGY AND PSYCHOLOGY 



, , , , . Moreover, the set of ranks in a value, renal clearance, clearance ratio, digestif 

::-t; £*££%*»* ^ ratio IQ * educat, ° q ^ e,c - 

constitutes a discontinuous series with real 


Variables in experiments 
An experiment is undertaken to study ^ 

changes of a particular variable (depend^, 
variable) in a sample on exposure of the latter to 


intervening gaps, and thus constitutes a 
disetmtinuous variable, 

* Nominal enables or attributes : vuruu'iw **■ -- - ,. , ._. • ^ 

. . n ,hor (riven variables (independent variables, 

Some variables such as sex, race, caste, .heir roles, variables involved in tL 

religion, profession, species, blood groups, mother Acc ° rd “ S .' are categ orized into depended 
tongue, eye color, skin color, hair color, living- dcnt and relevant variables, each 

and-dead, pregnant-nonpregnant, success-failure, P. „ tn nnv 0 f the classes described earlier 

HIV positive-negative, and some psychological belonging 
stimuli cannot be measured, quantitatively nor j Dependent variable : 
expressed in numerical scores. Nor can the ^ ie variable, whose possible change due tc 

individuals of a sample be ranked about that Qsure of ^ sample to some other variable^ 

variable because of no discernible difference in ^ being stud j ed j n an experiment, is called the 
magnitude of the variable. Such attributes or depef J ent var i a ble in that experiment — its 
nominal variables can only be studied changes> if an y > may be supposed to depend nr 


qualitatively. According to the qualitative 
similarity or difference of the members of a 
population with respect to such a variable, the 
latter is divided into classes with intervening gaps 
— an attribute, thus divided into only two classes, 
is called a dichotomous variable ; e.g., sex, 
pregnant-nonpregnant, success-failure. 


the effects of the other (independent) variable^ 
applied on the sample. Thus, while studying t hi 
changes in blood sugar after insulin adminis¬ 
tration to a sample of animals, the blood sugai 
concentration whose changes are measured 
constitutes the dependent variable ; in an 


pregnant-nonpregnant, success-raiiure. on changes in respiratory' rate aftei 

Individuals of each such class are quahlahvely administration t0 a sample, (he 

similar to each other and dtfler Iron, dtose: of other rates constitute ^ d dent variable, 

classes, but these differences cannot be assessed P J ^ ^ jn ^ first example , 

quantitatively. So, K can be stated (0 whether two variable, viz., blood sugar, is a 

individuals are similar to or different from one ^ . ...... .l. 

another about such a variable, but (it) not whether continuous measurement variable ; but m the 

one is higher or lower than the other. Although seco * ld instance ’ .respiratory' rate is a depends 
such a variable cannot be measured to give variable of the discontinuous type, gam, man 
numerical scores for the respective individuals of experiment on the effect of practice on t e 
a sample, the numbers of individuals or cases makings of workers in a given job performance, 
counted in its different classes, form enumeration die ranks in the job performance constitute t e 
data for statistical treatment (Table 2.1) dependent variable, belonging to the ordinal class. 

Although changes of the dependent variable are 
4. Derived variables : explored or measured in the experiment, this 

Each such variable consists of the variable —’ ’ ’ * .. . t “* 


relations between the directly measured/counted 
scores of two or more measurement variables, 
expressed as their ratios, proportions or 


variable is not controlled or “fixed” by the 
investigator and is thus liable to random errors- 
In each experiment, there is a single dependent 
variable ; an investigation into the effects of an 


r .~ — . . . «»wwo, (nujiutuutu ut wiiuuii- , an uivcsiigauon mio uie enecis ui »«■ 

percentages. Examples : respiratory quotient, applied independent variable (say, epinephrine) 
Ri\1R, color index, cephalic index, hematocrit on more than one dependent variable (say, blood 


Scanned by CamScanner 


STATISTICS, VARIABLES AND SAMPLES 


5 


[sugar and blood lactate) should be considered as 
a combination of more than one experiment, done 
simultaneously with the same sample using an 
identical independent variable, but each having a 
single dependent variable (viz,, blood sugar and 
blood lactate, respectively). 


In psychology , any behavioural variable, whose 
changes are assessed in a test or experiment, 
constitutes the dependent variable and is generally 
called the criterion in that test. For example, in a 
| test for degrees of steadiness in a sample of 
humans after different levels of practice, 
I steadiness is the dependent variable or criterion 
I while practice is the independent variable. 
Similarly, in psychological tests for assessing job 
I satisfaction at different levels of attitude of 
I management, job satisfaction is the criterion while 
I management attitude is the independent variable. 


2. Independent variable : 

The variable, whose effects on the dependeni 
variable are studied in an experiment, is callec 
the independent variable in that experiment — 
the changes of the independent variable are no 
dependent on the changes of the dependen 
variable in the relevant experiment. In ar 
experiment, the dependent variable is always ; 
single one, but there may be one or man 
independent variables in an experiment and cacl 
of them may belong to any of the classes o 
variables described earlier. In an experiment o. 
the effect of insulin administration on blood sugai 
doses of insulin constitute the lone independen 
variable of the continuous variable class ; in 
study of the difference in wing lengths betweei 
males and females of an insect species, sex is th 
| single independent variable of the nomina 
variable class ; in a study of the combined effect 
° androgeme steroid administration and sex oi 
etic performance, the doses of the andregeni 

variahi co ” st * tule a continuous mcasuremen 

beinoth W ^ 1C SCX 1S a no,ninal variable, bot 
being the independent variables in this experimen 


In psychology, the independent variable whose 
effect on n given behavioural variable (criterion) 
is explored in a test is called the predictor, 
supposing that the criterion score of an individual 
may be predicted from the quantity/naturc of the 
independent variable affecting him. 

Independent variables may belong to any of 
the following types, according as they are under 
oi beyond the control of the investigator. 


(a) Treatment variables : These are extensively 
controlled by the investigator. Their qualitative 
forms, quantitative amounts or applied doses as 
well as methods of their application are either 
fixed or altered by the investigator in deliberate 
predetermined and precise manners in an 
experiment to study the effects of such changes 
on the dependent variable ; they are not otherwise 
allowed to change, vary or fluctuate at random, 
while in use in the experiment. They thus do not 
suffer from random errors during the experiment. 
Examples : applied doses of a hormone, drug, 
pesticide or radiation ; extirpation or 
transplantation of an organ ; pitch or loudness of 
a stimulating sound ; periods of practice; voltage 
or amplitude of stimulating current; intensity or 
wavelength of light stimulus. 

( b) Classification variables : They are not 
directly controlled or “fixed” by deliberate 
manipulations by the investigator — rather, they 
change at random owing to their inherent nature 
and external factors, all beyond the control of the 
investigator. So, they are liable to random errors . 
Otten, they have been existing and affecting the 
dependent variable since long before the 
experiment, In an experiment, die sample includes 
cases either already exposed to different levels or 
a ready belonging to different classes of the given 
classification variable, and is subjected to the 
assessment of the intended dependent variable, 
examples : sex, age, breed, genetic disorders, 
cosmic rays, atmospheric pressure or temperature, 
phenotypes, race, blood groups, intelligence 
levels, temperament and personality. 


Scanned by CamScanner 







6 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


In the cited experiment about the ellcet of 
insulin administration on blood sugar, the 
independent variable is a treatment variable 
constituted by the “fixed” doses of insulin ; in a 
study of the difference in wing length between 
male and female insects, sex is a classification 
variable beyond the control of the investigator ; 
in a study of combined effects of androgenic 
steroid administration and sex on athletic 
performance, “fixed” doses of the administered 
steroid constitute a treatment variable while sex 
is a classification variable. 


• 3. Relevant variables : 

In every experiment or test, the dependent 
variable may be affected by numerous variables 
that are not intended to be used by the investigator 
as independent variables (predictors) to study their 
effects on the dependent variable (criterion), but 
may still affect the latter if not effectively 
controlled by the investigator. Such variables are 
called the relevant variables for the given 
experiment or test. They may be classified as 
follows. 


(a) Subject-relevant variables : These arise 
from variations of numerous properties or qualities 
of the subjects (individuals) chosen for the sample. 
In life sciences, subject-relevant variables include 
sex, race, strain, breed, body weight, 
genotype, phenotype, etc., of the subjects. In 
psychology, such variables include aptitude 
interest, motivation, personality, intelligence etc’ 
of the subjects. 


(b) Situational relevant variables : These; 
om experimental situations and physical or s< 
environments. In life sciences, they incl 

i^strength^ 10 ^ eXperiments such as 

• 5 th ’ tem P er ature, density and equini 

tT n ° lSe * d ‘ strac h°n, humidity Tr k 
space and ventilation in field or wnrL u * 

Inindu sW alps y chol„ gy ,th v Ste 

organizational and social variable ‘ x 
incentives, , rain , ng , 

management. alUluc 


(c) Sequence-relevant variables : These a 
from the sequence or order of application'** 
independent variables. In life sciences, th° f 
include the order of application of different do^ 
or levels of drugs and other treatments on tv! 
subjects. In psychology, they include practi C( ? 
fatigue and monotony affecting a criterion «., a l 
as performance. 


The unwanted effects of relevant variables are 
sought to be minimized by experimental design 
such as randomizations of sampling and oi 
application sequence of independent variable, use 
of equivalent or matched groups of subjects f 0 « 
different doses or levels of application of 
independent variable, and other counterbalancing 
techniques. 


a city b 

Me 

deviat 

remai 

whet! 

But 

impr; 

with 

atte' 

acci 

kno 

om 

esti 


1.3 POPULATION AND SAMPLE 

Population 

The vast group or aggregate of all animate or 
inanimate individuals, cases or events that possess 
some form or amount of the variable under 
investigation in an experiment, test or survey, 
constitutes the population for the latter. It is of 
the following two types. 

(a) Infinite population : Such a population is 
so vast and widespread that all its members cannot 
be counted and its size cannot be precisely 
determined ; e.g., population of all diabetic 
humans, all spawning salmons, all sickle cell 
anemia cases, all thalassemia patients, all Down 
syn rome cases, cholinergic neurons of all 
umans, all hypophysectomized white rats of a 
parhcular strain, all locusts exposed to a pesticide, 
and all fluoride-heated cultures of lactobacillus 
case, — m the whole universe. 

limh\ p n ' te Population : Such a population is so 
mav^h m SlZe an< * l° cat * on ^ at all its members 
all exlr ClSe ' y C ° Unted ; e 6- Populations o! 
examinatimTM" u* se o° nd ary school-leavin; 

all Mansarovar pilgrims of",'°'^ ,,pi ^ dor Asiad 


Scanned by CamScanner 



STATISTICS. VARIABLES AND SAMPLES 


7 


a city blood bank, and all crocodiles in a sanctuary. 

Measures like mean, median and standard 
deviation of any variable may be considered to 
remain more or less constant in a population, 
whether finite or infinite, because of its large size. 
But it is too laborious, expensive and 
impracticable to conduct any experiment or survey 
with the entire vast population ; even if so 
attempted, some of its members may be 
accidentally left out of the study without the 
knowledge of the investigator so that such 
omissions can be neither compensated nor 
estimated for the consequent errors. 

Sample 

A sample is a relatively small group of 
members (individuals, cases, events, etc.) chosen 
from a population for being subjected to an 
experiment, test or survey instead of that 
population. Use of a sample in an experiment or 
survey is the only practical solution of problems, 
cited above, in using the entire population in such 
work. Because observations recorded from a 
sample are to be used for a generalized inference 
about the entire population, the sample should be 
drawn by well-planned scientific methods to make 
it truly representative of the population. Such a 
representative sample should fulfil the following 
criteria. 


(a) Proportions of different types of individuals 
or events in the sample should conform to their 
respective proportions in the population. 


(b) Variations of the dependent variable am< 
the members of the sample should not exceed 
range predictable by the laws of probability fr 
the corresponding variations in the populatioi 

(c ! f^‘^ ere nt samples drawn from 
population should have fairly close sun,,,, 

- *c mean so Z i 
afT e aSSUmc thal ,hc arithmetic aver 
“^H EUCh “f 8ti£of Afferent Munples wc 
/ e Wlt ^ l ^ e corresponding summary Vi 
(parameter) of the population. ^ 


1.4 SAMPLING METHODS 

A sample should be fairly representative of the 
population, because (/) the observations made v/ith 
the sample have to be used for generalized 
inferences about the entire population, and (ii) the 
errors of inference, arising from the planned 
exclusion of the rest of the population from the 
study, have to be estimated. So, the sample is to 
be drawn by well-planned methods, described 
below. 


1. Judgement sampling 

In this method, the investigator deliberately 
chooses some members of the population for 
forming the sample, judging those particular 
members as the representatives of the population, 
often depending arbitrarily on some quality or 
characteristics in them. No random sampling is 
used, depending on the laws of probability, for 
drawing the sample. Such a sample is frequently 
biased and devoid of any representative nature, 
because of if) personal judgement in the choice 
and (it) deliberate unplanned exclusion of other 
members of the population. So, such judgement 
sampling should be avoided. 


— - OMIIipUlJg 


Probability sampling is done by choosing at 
random the individuals or cases from a population 
for inclusion in the sample, depending (0 on the 
laws of probability and («) on the frequency oj 
each type of individual or case in the population, 
ini) leaving no scope for arbitrary choice by any 
person. This ensures that (0 individuals or cases 
of a larger section of the population would get 
c osen for the sample in proportionately higher 
number than those of a smaller section, and (,7) 
individuals or cases of each section or class of 
the population would be included in the sample 
«n the same proportion as that in the population 
ethods ol probability sampling are as follows : 

in) Simple random sampling : 

This consists ol the random choice of 
individuals or cases for a sample, depending on 


Scanned by CamScanner 


8 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


1 


the laws of probability, from the undivided whole 
of a small, finite and homogeneous population. 
Simple random sampling ensures that (i) no 
element of conscious or unconscious bias, whim 
or personal factor of the investigator affects the 
choice, (if) each member of the population has an 
equal probability of being chosen for the sample, 
and (Hi) the choice of any member of the 
population for the sample is independent of the 
choice or exclusion of any other member. The 
sample size , i.e., the number of individuals or cases 
in the sample, should be determined statistically 
because (i) the smaller the sample, the higher is 
the probability of exclusion of particularly the 
extreme and rare cases of the population, and (it) 
a minimum sample size is required for reducing 
the errors due to the planned exclusion of the nest 
of the population. 

In the unsophisticated carddrawal method, all 
members of the population are assigned serial 
numbers which arc written singly on separate 
cards. The cards arc shuffled and mixed together, 
and as many of them are then blindly pu k ed up 
as the required sample size. The individuals 
bearing the serial numbers on the drawn cards arc 
included in the sample. There arc the following 
two ways of random sampling. 

(i) In random sampling with replacement, a 
member once chosen for the sample continues to 
be considered for all subsequent choices also. In 
the card drawal method, for example, a card once 
drawn would be replaced in the bunch of cards 
after noting its serial number ; this would give a 
chance to the same card for being drawn again. 
This procedure keeps the probability of choice of 
each individual identical and unchanged at any 
step of choice, because all the choices continue 
to be made from the unchanged total number of 
individuals. Thus, for each drawing from a 
population having a total of N number of members, 
every member has the same probability of 1 IN of 
being chosen for the sample. But it is impracti¬ 
cable to work with a sample in which an individual 
gets included more than once due to the 


replacemenL 

00 In random sampling without reploc fr ^. 
an individual once chosen is kept out of 

consideration for all subsequent choices a 

once drawn is not replaced in the bunch ofcark 
Of course, the probability of being chosen for the 
sample rises from choice to choice instead of 
remaining identical, because with each choice the 
total number of individuals left for the next choice 
falls by 1. Thus, for the rth choice from a 
population of size <V, the members left for that 
choice would number (N - r + 1), each with a 
probability of \/(N - r + 1) for being chosen for 
the sample ; but for the next or (r + l)th choice, 
this probability would rise to 1/ftV - r). This 
progressive rise in probability is, however, 
negligibly low or practically absent for small 
samples from infinite (N = «*) or large finite 
populations. 

Random number method : A more scientific 
and precise method of random sampling is based 
Of) a random number table of sequences of a large 
number of digits compiled and arranged at random 
with a random number generator. For sampling 
without replacement, all members of the 
population are assigned identity numbers serially, 
each such number comprising as many digits as 
those in the population size. If. for example, a 
sample of 11 individuals has to be drawn from 
among a total of 48, the identity numbers should 
be in two digits. However, many random numbers 
will have to be rejected before the sample size is 
fulfilled if only one identity number is allotted © 
each member ; so, after assigning the first sene> 
of identity numbers — say, 01 to 48 — to the 
members, the allotment of identity numbers is 
continued in the same order — say, starting fret' 
49 and going upto 96 (Example 1.4.1). Thus. “• 
members bearing the identity numbers of OS ac- 
11 respectively in the first series, bear also the 
second-series numbers of respectively 56 and 5 
the remainder left after dividing any second-sen. 
number (say, 59) by the last first-series nurn^- 
(48 here), equals the first-series idenutv nunibtf 


Scanned by CamScanner 


STATISTICS. VARIABLES AND SAMPLES 


9 


(viz., 11 1101X5) of the same individual. Next, any 
digit of any set of random number of the table is 
chosen at random ; passing either horizontally 
along the row or vertically along the column from 
this chosen digit and omitting none of the digits 
in this path, each set of as many consecutive digits 
as the digits in the population size is noted as a 
chosen random number and used as follows : (/) 
when a chosen random number coincides with a 
first-series identity number, the individual bearing 
that identity number is included in the sample ; 
00 vv ben the chosen random number coincides 
" a second-series identity number, that number 
is replaced by the corresponding first-series 
identity number and the individual bearing the 
latter is included in the sample ; in the 
aforementioned Example 1.4.1, if a random 
number of 76 gets chosen, the individual bearing 
the identical second-series identity number of 76 
and thus the corresponding first-series identity 
number of 28 gets included in the sample ; (iii) 
any chosen random number higher than the last 
second-series identity number (96 here) is 
rejected; (iv) random number of 00, if chosen, is 
also rejected ; (v) in case of sampling without 
replacement, as in the given example, any identity 
number getting chosen a second time is also 
rejected. These procedures are continued till the 
required sample size is reached. 


(b) Stratified random or quota sampling : 

This method is used if the population is la 
and heterogeneous with distinct subpopulati 
or strata, each differing from the others v 
respect to a property related to the variable un 
study, but being practically homogeneous in its 
Here, simple random sampling is appl 

indiWduafT ^ StmtUm t0 draw that m 
dividuals from it as would correspond to 

1X n '°" al SiZe 0f U,at Slra,um in «» populal 
I opuiation having a male : f en 


ratio of 55 : 45, a total of 0.55 x 160 or 88 males 
and 0.45 x 160 or 72 females should be chosen 
by separate simple random samplings from the 
respective strata of the population, ensuring a 
male : female ratio of 55 : 45 in the sample also. 
Evidently, while all members of a particular 
stratum have the same probability of being chosen 
for the sample, this probability varies from stratum 
to stratum according to their proportional sizes in 
the population. Stratified random sampling 
minimizes the errors due to the heterogeneity of 
the population and thereby increases the 
representative character of the sample. 

(c) Multistage sampling : 

This method is used when the population is 
vast and widespread. Utilizing some existing steps 
of the population, it is divided into a number of 
successive stages ranging from the entire 
population to the individuals. Simple random 
sampling is then applied at each stage separately. 
For example, for an experiment or test about an 
anthropometric variable or a learning ability of 
urban class X students of a state, a few of the towns 
may first be chosen at random as the first-stage 
units from amongst all the towns ; a number of 
schools are next chosen at random in the chosen 
towns as the second-stage units ; then a specific 
number of class X students are chosen at random 
from the chosen schools at the third and final stave 
to constitute the sample. 


Where all members of a population have 
tUteady been arranged and numbered serially in 
alphabetteal chronological, registration- or 

od rTeT', ed ,° r S ° me “'^systematic 
order, the first member for the sample may be 

chosen pureiy at random. This may be followed 

y eluding m the sample other members 

occurring at randomly predetermined regular 

mtei vals along die existing serial Older. Thus, such 

a systematic or fixed interval sample of students 

•Tv Rol " “m" j°™ * ClaSS by ch0 °smg at random, 
My. Roll No. 4 as the first individual and then 


Scanned by CamScanner 




STATISTICS IN BIOLOGY AND PSYCHOLOGY 


10 

drawing, say, every subsequent seventh Roll no. 
Though apparently a random method, it may yield 
a biased sample if the initial serial order has been 
based on some property related to the variable to 
be investigated. 

(e) Purposive sampling : 

In this method, random sampling is done from 


only a limited section of the populati 
considered by the investigator to represent \' 
entire population faithfully with respect t 0 m* 
variable under study. But such a sample may * 
biased if the assumption about the represents * 
nature does not hold good for the relevant section 
of population. 


Example 1.4.1. 

Draw a sample of 11 scores by random sampling without replacement, from the following 48 body weight 
(kg) scores, using the random numbers provided below the data. 

Scores : 

68, 59, 57, 64, 52, 60, 62, 57, 60, 62, 57, 61, 63, 47, 55, 50, 59, 71, 70, 49, 56, 67, 82, 56, 
45, 73, 66, 46, 72, 44, 58, 62, 53, 78, 40, 43, 60, 54, 76, 81, 38, 77, 41, 47, 66, 65, 72, 70. 


Random numbers: 
4227 1701 

4108 8414 

Solution: 


7616 

8116 


3200 

6688 


7336 

7061 


3974 

0344 


7025 

4348 


(a) As the total number of scores has two digits (n = 48), identity numbers of two digits are allotted to the 
scores. Proceeding serially from the first to the last score of the data, all the scores are first allotted one identity 
number each (viz., 01 to 48); then, the allotment of identity numbers (viz., 49 to 96) is continued a second time 
starting again from the first to the last score. 

Table 1.1. Allotment of identity numbers (IN) to the scores. 


Scores 

IN-1 

IN-2 

Scores 
IN-1 
IN-2 


68 59 57 64 52 60 62 57 60 62 57 61 63 47 55 50 59 71 70 49 56 67 82 56 

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 

49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 

45 73 66 46 72 44 58 62 53 78 40 43 60 54 76 81 38 77 41 47 66 65 72 70 

25 26*27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 

73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 


(b) Any digit of any set of the provided random numbers is then chosen at random to start with - here he 
digit (viz., I) of the second set <Wz„ 1701) of the first row is so chosen. Staning from this, groups of each ^ 
consecutive digits are recorded from the random number sets proceeding serially along their rows A 
“number is rejected if (/) it is higher than the highest identity number (viz., We second s^M 
the random number is 00, and (fit) the random number has already been recorded once earlier. Thus. 

random numbers are: 

17, 01, 76, 16, 32, 00 (rejected), 73, 36, 39, 74, 70, 25, 41, 08, 84, 14, 81, 16 (rejected), 66. ^ 

(c) The recorded random numbers are replaced by ,he corresponding identity (IN)IT J 

Join number coincides with a firs,-series IN, the latter i 

coincides with a second-series IN. the first-ser.es IN corresponding IN 

a random number is higher than the total number of scores (v,z„ 48). it is then replaced b, 


Scanned by CamScanner 




STATISTICS, VARIABLES AND SAMPLES 


11 


• i -ic nnr i w hich have already come once earlier, should also be 

rejecreden thesnmpUn* is biing done hoe without replacemen,. Thus. ,he final iden.ity 

numbers chosen are as follows: 

17.01. 28. 16.32, 25, 36, 39, 26,22, 25 (rejected), 41, 08, 36 (rejected), 14, 33,18. 

<At The sanmle size being 11, the scores bearing the first 11 chosen identity numbers are included in the 
sample and given below. (If any score would have occurred a second time, it would have been rejected.) 

59,68,46, 50,62,45,43, 76, 73,67, 38. 


1.5 PARAMETER AND STATISTIC 

Parameter 

A parameter is a summary value or numerical 
index like mean, standard deviation, median or 
variance of a variable for the entire population. 
As an entire population is not usually subjected 
to experimental methods, a parameter is not 
directly worked out or precisely known in most 
cases. However, measures like mean and median 
of a sample are frequently straightway used as 
the estimates of the corresponding parameters of 
the population from which the sample has been 
drawn (point estimates). Moreover, a narrow range 
of scores, called a confidence interval, is 
sometimes worked out from the measures like 
mean and standard deviation of the scores of a 
sample with a stipulated probability of the 
parameter falling within that interval ( interval 
estimates). A parameter is not expected to change 
so long as the population remains unaltered. 

Statistic 

A statistic (plural : statistics) is a summary 
value or numerical index like mean, median, 
standard deviation or variance of the scores of a 
variable in a sample. It can be directly worked out 
from the measured scores of the variable in the 
sample, and can be used either directly as a point 
estimate of the corresponding population 
parameter, or in working out a confidence interval 
as an interval estimate of the latter. A statistic 
varies from sample to sample of the same 
population due to different individual scores of 
different samples. A statistic frequently differs 


from the corresponding parameter, such a 
difference being known as the sampling error. The 
values of a statistic for different samples from the 
same population are scattered around the value 
of the corresponding population parameter to form 
a sampling distribution of that statistic. Statistics 
may be classified as follows. 


Table 1.2. Symbols for a few statistics and parameters. 


Measure 

Statistic 

Parameter 

Mean 

Standard deviation 

X 

s,s x 

M 

G, G x 

Variance 

s 2 

a 2 

Correlation coefficient 

r 

P 

Standard error of mean 

s x 



(a) Descriptive statistics : 

These statistics describe the properties of a 
sample with respect to the given variable of 
variables. They include mean, median, mode, 
percentiles, standard deviation, variance, 
coefficient of variation, correlation coefficient, etc. 
They belong to the following three categories 
according to the property of the sample described 
by them. 

(/) Statistics of location: They give the location 
of either a central position or some other specific 
point of the frequency distribution of scores in a 
sample on the scale of the variable. Statistics of 
location have two subclasses : (a) measures of 
centralmvalues such as mean, median and mode 
which give some cenUal scores of the frequency 


Scanned by CamScanner 







12 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 



distribution, around which other individual scores 
of the sample tend to collect, and (b) fractiles such 
as percentiles and qunrtiles which give SQjnc 
points of the distribution below which lie specific 
fractions, or percentages of the total number of 
scores. 

(») Statistics of dispersion : These are measures 
of the spread or scatter of the scores of a sample 
around a central value like mean or median. They 
are further classified as follows : (n) absolute 
measures of dispersion such as standard deviation. 

\ ariance and quartile deviation, computed directly 
from the raw scores of die sample and expressed 
in either the same unit as the scores, or some power 
that unit, and (b) relative measures of dispersion 
such as coefficient of variation and coefficient of 
quartile deviation, which consist of expressions 
of some absolute measures of dispersion as ratios 
or percentages of the corresponding statistics of 
location, and do not bear the units of the raw 
scores. 

(Hi) Statistics of correlation : These measure 
and describe the magnitude and direction of 
relationship between two or more variables in the 
individuals of a sample, e.g., between body height 
and weight, wing length and trunk length, cardiac 
output and venous return, anxiety test scores and 


performance, and intelligence test score, 
achievement scores. Correlation coefficient *** 
as product-moment correlation coeffieij Uch 
rank-difference correlation coefficient h ^ 
this class. lem belo ngt0 

(b) Sampling or inferential statistics : 

These include statistics like standard err. 
which are not restricted within the limits .T’ 
sample unlike the descriptive statistics, rafter/ 
beyond the sample and help to make infer! 
and generalize them from the sample to the en! 
population. They find applications in ,esti ng 
ypotheses, finding the significance of difference 
between statistics of different samples J 
working out confidence intervals of parang" 

(c) Prediction statistics : 

These include statistics such as regression 
coefficients and beta coefficients, used ft 
predicting the most likely score of a dependent 
variable (criterion) in an individual from his actual 
score(s) in one or more independent variables 
(predictors). Examples : predictions of workshop 
skill scores from mechanical aptitude test scores, 
plasma prothrombin concentration from blood 
clotting time, body surface area from body height 

and weight, and 0 2 consumption from tracheal' 
ventilation. 


GLOSSARY 

P “ iSKSSS, ind n SUCh —»• “ deviation, standa* eno, 

on coemcient of all the scores of a variable or variables in the entire population 

T '’‘^("^^^•--oreventsintheumverseaseiUterpo^s 
g cn torm or belong to a g.ven class of the specific variable investigated in a test, experiment or surve, 

sample^ a rekttvely smafi 8-UP of individuals, cases or events, chosen from a population for being subjected, 
experiment, lest or survey instead of that population, as its representative. 

sampling : drawing of a sample by choosing individuals or cases from a population. 

sampling, judgm ient: sampling by the deliberate and arbitrary choice of some individuals or cases from ,1 
population, depending on the personal judgement of the investigator. 

divisions offvasl'widespread popu^ 61 ’ 0 ™" 15 ' e * ch of several sl "8“ fo™ed by successive stepwi 


Scanned by CamScanner 








STATISTICS, VARIABLES AND SAMPLES 


13 


sampling, probability : random choice of individuals or cases for a sample from the population, depending on 
the laws of probability. 

sampling, purposive : random probability sampling from only such a limited section of the population as is 
adjudged by the investigator to represent the entire population. 

sampling, simple random : random probability sampling from the undivided whole of a small, finite and 
homogeneous population. 

sampling, stratified random : simple random samplings separately from each of the strata or subpopulations of 
a large heterogeneous population to ensure the random inclusion of individuals from each stratum into 
the sample. 

sampling, systematic : choice of individuals for a sample at randomly predetermined intervals along a pre¬ 
arranged serial order of all members of the population. 

sampling distribution : distribution of values of a statistic of different samples around the corresponding 
parameter. 


sampling error : difference between a statistic and the corresponding parameter. 

scale, interval: scale for a measurement variable like temperature, having no true zero point, starting consequently 
from any convenient arbitrary zero, and making it mathematically impermissible to compute a ratio of 
any two scores of such a scale. 


scale, ratio : scale for a measurement variable such as length, mass and volume, having a true zero point and 
enabling the computation of a ratio between any two scores of such a scale. 

statistic : any summary value or numerical index such as mean, standard deviation, quartile, quartile deviation 

and standard error, worked out from the scores of a variable in a sample and used as an estimate of the 
corresponding population parameter. 


statistics : (a) plural of ‘statistic’ ; (b) the science of application of mathematical principles in the collection, 
presentation and analysis of the data of an experiment, test or survey, for interpretation and inference. 

statistics, descriptive: statistics such as mean, standard deviation and correlation coefficient which describe the 
properties of a sample with respect to the relevant variable(s). 

statistics, prediction : statistics such as regression coefficients and beta coefficients that are used in predicting 
the most likely value of one vanable from the known values of one or more other variables. 

statistics^sainpUng ~cs-dt as s-andard emirs which go beydnd a single sample to estimate sampling 


,he maem,ude a,,d “> ° f 

^hichnLMsure n thet! i M? PllVe a SUlUi ’ t ' CS SUCh “ standard dev ‘“'°". variance and coefficient of variation, 
or median. ^ U Jspersion ot individual scores of a sample around a central value like mean 


specific point oflt^frelj^n^latistics sUfch as mean, median and mode, which describe the location of a 
f I the frequency d.stnbution of sample scores on the scale of the variable. 

cVse^r P phce n to itceevu/al “heTam 7™ *" quanU,y ° r . quality either spatia,1 y* ie - from case to 

location or in the same individual. ^ ° f tempon ^* i e - ,rom ti,ne to time even at the same 


Scanned by CamScanner 



14 


STATISTICS IN lUOUMIY AND PSYCHOLOGY 


variable, classification : such an independent variable of on experiment ns is beyond (he control of »he investigator 
and may thus suffer from random changes during the experiment. 

variable, continuous: a variable that can have values, not only in whole numbers of units, but also in fractional 
units between two whole units. 

variable* dependent : the variable whose possible change on exposure to specific independent variable*,. j s 
being studied in an experiment or test. 

variable, discontinuous : a variable that can have only certain specific values — usually in whole number , of 
units only — and cannot possess intermediate values such as values in fractional units. 

variable. independent: variable(s) whose possible effects on a dependent variable are being explored in a given 
experiment or test. 

variable, measurement : variable whose magnitude can be measured or counted and expressed in numerical 

values. 

variable, nominal: a variable such as sex, blood group, race and caste, which cannot be measured, counted or 
expressed quantitatively in numerical values, nor can the individuals of a sample be ranked with respect 
to such a variable. 

variable, ordinal : a variable such as ferocity, timidity, personality, attentiveness and leadership quality, that 
varies distinctly in magnitude from individual to individual in a sample, enabling them to be ranked 
accordingly, but cannot be measured quantitatively in the individuals. 

variable, relevant : any variable, not intended to be used by the investigator for studying its effect on the 
dependent variable, but nevertheless affecting the latter if not effectively controlled by the investigator. 

variable, sequence-relevant: relevant variable(s) arising from variations in the sequence or order of application 
of the doses or levels of independent variable(s) in an experiment. 

variable, situational relevant: relevant variable(s) arising from experimental situations and environment, such 
as variations in pH, temperature, light intensity, noise, humidity, work space, supervision in workshop 
etc. 

variable, subject-relevant : relevant variable(s) arising from variations of individuals of the sample, such as 
their differences in age, sex, body weight, motivation and intelligence. 

variable, treatment : independent variable(s), strictly controlled by the investigator and not suffering from 
random changes during the experiment. 






Scanned by CamScanner 


2 . presentation of data 


Mere inspection of numerical scores of 
untreated raw data of an experiment or survey 
elicits little meaningful information and fails to 
help in the interpretation of observations. For 
understanding and analyzing the data, and for 
comparison with other sets of data, the raw data 
are systematically arranged, and graphically 
represented. 

2.1 FREQUENCY DISTRIBUTIONS 

.4 frequency distribution shows the number 
(frequency) of individuals, cases, events or scores 
of a sample, arranged and tabulated in the classes 
into which die variable under investigation has 
been classified. 

Qualitative frequency distribution 

A qualitative frequency distribution shows the 
number of individuals or cases of a sample in 
different classes of a nominal variable such as 
blood group, sex, phenotype, eye color, fur color, 
race, religion and occupation (Table 2.1). A 
nominal variable cannot be measured quanti¬ 
tatively in numerical units ; so, it is divided into 
classes depending only on the qualitative 
distinction between the individuals according to 
the presence or absence of a particular property 
or characteristic. The classes are, therefore, located 
only at specific points on the scale of the variable, 
separated from each other by intervening 
gaps ; no case falls in the gap between two such 
points. Such frequency distributions are thus 
known as point distributions. Classes of such a 
variable have no numerical range and are entered 
in one column of a simple frequency table while 
the counted frequency of individuals in each class 
is entered in another column against that particular 
class. 


•l^ble 2.1. A qualitative frequency distribution of blood 


groups in a sample. 


Blood group 

Frequency (J) 

A 

40 

B 

164 

0 

24 

AB 

26 

Total («) 

254 


Quantitative frequency distribution 
Tliis consists of a table showing the number 
(frequency) of individuals or cases of a sample in 
different classes or groups, into which the entire 
range of scores (numerical values) of a measure¬ 
ment variable has been divided. The data, arranged 
into such a frequency distribution, are called 
grouped data because the scores are distributed 
among the classified groups. Such classification 
reveals salient features of the variable in the 
sample in a meaningful way, e.g., the class having 
the highest frequency, and the pattern of 
distribution in different classes. It also helps in 
the statistical analysis and interpretation of data, 
particularly of a large sample. 

1. Frequency distribution of a continuous 
variable : 

There is no real gap between the classes of 
such a variable; the consecutive classes arc 
continuous with each other. 

(a) The total range of scores is worked out 
from the highest and lowest observed scores. 

Range = (highest score) - (lowest score) + 1. 

(b) To divide the total range into a suitable 
number of class intervals , the class size or range 
of scores of each class is so chosen that (i) each 
class interval covers about 3, 5 or 10 scores, the 
choice depending on the total range of scores and 

15 


Scanned by CamScanner 









r 


16 


STATISTICS IN BIOLOGY AND 


PSYCHOLOGY 



the sample size, (ii) the class size should 
preferably be the same for all llie class intervals, 
(Hi) the total range is thus divided into 5-20 class 
intervals of equal size, (tv) the lowest and the 
highest scores of the data must fall within the 
lowest and the highest intervals respectively, and 
(v) the incidence of too many or too few cases in 
a class is avoided - details are obscured by too 
long class intervals with too many cases in some 
of them while the purpose of a meaningful 
arrangement of the data is defeated by too short 
class intervals with no or few cases in most of 
them. 

(c) Score limits of each class, viz., its highest 
and lowest scores, should be stated precisely to 
avoid confusion in entering the observed scores 
in the respective intervals. In Table 2.3, for 
example, the sixth and seventh class intervals are 
given in an ascending order as 66-68 and 69-71 
respectively. So, there should be no confusion 
about a score 69 belonging to the seventh interval 
only. 

(d) In a continuous series of data, there should 
be no gap between either the limits .^f successive 
intervals or the consecutive scores. Each score is 
thus deemed to occupy a unit interval extending 
from 0.5 below that score to 0.5 above the latter. 
For example, the score 66 should be taken to 
extend from 65.5 to 66.5 and the score 67 from 
66.5 to 67.5 so that no gap is left between 66 and 
67. Therefore, true class limits or class boundaries 
like 65.5-68.5 and 68.5-71.5 should be used 
instead of the score limits like 66-68 and 69-71, 
respectively (fable 2.3). Each class interval then 
is not separated by gaps from the neighbouring 
class intervals. 

True lower limit (Xj) = Vt ((lower score limit of the 
interval) + (upper score limit of the next 
lower interval)]. 

True upper limit (X u ) = Vi [(upper score limit of the 
interval) + (lower score limit of the next 
higher interval)]. 

(*) 1 hc class size (0 of an interval is obtained 


from the difference between either its true f 
or the upper (or lower) score limits of con,. Im ' ls ' 
intervals. Thus, ecut 'v* 

Either, / = X u - X { ; 

or, / = (upper score limit of one interval) - {u 

score limit of the next lower interval) ^ 

(<?) Class intervals may be tabulated either with 
the lowest interval at the bottom and the higf^ 
one at the top, or with the lowest interval at th' 
top and the highest one at the bottom. 

(/) In such a grouped frequency distribution 
all individual cases are assumed to possess the 
score identical with the midpoint (X c ) of that 
interval. Where X u and X t are the true upper and 
lower limits respectively of the interval: 

either X c = Vi(X l + X u ); 

or, X c = Vi [(lower score limit) + (upper score limit)] 

For example, for a class interval of 66-68 
(65.5-68.5), either, X c = ‘/2(65.5 + 68.5) = 67, or 
X c = '/ 2 (66 + 68) = 67. 

(g) After tabulating the limits and midpoints 
of each interval, each individual case is entered 
as a tally against the interval to which it belongs; 
every fifth case in a class is entered as a diagonal 
tally against the last four tallies in that class (Table 
2.3). Individual scores, coinciding with true limits 
of intervals, are all entered either in the intervals 
with true lower limits identical with the respective 
scores, or in the intervals with true upper limits 
coinciding with the respective scores. After all 
scores of the sample have been so entered as tallies 
against the respective intervals, the total frequency 
/'in each class interval is entered in the table. The 
frequencies of all the intervals are finally totalled 
to give the sample size. 

(/») When a few scores are scattered over wide 
ranges at one or both ends of the distribution, they 
are often grouped together in a single class interval 
with an indeterminate lower or upper limit (open 
class interval) with a range such as T5 and below 
or *80 and above’ for respectively the lowest and 
the highest classes. No midpoint can be computed 


f 

* 

b 


a 

\ 

s 

z 

c 

• 

1 

1 

1 

I 

I 

1 

i 

h 

i 


j 


Scanned by CamScanner 










PRESENTATION OF DATA 


for open class intervals. Such frequency distri¬ 
butions are called incomplete distributions. 

2. Frequency distribution of a discrete or 
discontinuous variable: 

Because consecutive scores of a discrete 
variable like RBC count and family size are 
separated by real gaps (page 3), the classes of such 
a variable are separated by intervening gaps. So, 
discrete frequency distribution has its class 
intervals with score limits in whole units only, and 
with no true class limit in fractional or decimal 
units. This leaves gaps between the classes with 
no score in these gaps. 

Where scores of a discrete variable are 
relatively few, each occurring once or only a few 
times in the sample, their frequencies are arranged 
in an ungrouped frequency distribution called a 
simple frequency table. Each single distinct score 
is entered individually in this table as if to 

Example 2.1.1. 


17 

constitute a class by itself ; no class interval is 
formed here by grouping a range of more than 
one score. The frequency of a particular score is 
entered directly against it and not in any grouped 
interval of a range of scores. In the case cited in 
Table 2.2, no family can exist with a fractional 
number of children like 2.5 or 3.3, real gaps exist 
between the whole numbers of children, and the 
frequencies are entered against individual scores, 
not against grouped intervals. 


Table 2.2. A simple frequency table for a discrete 
frequency distribution of families in terms 
of the number of children. 


Number of children 

Number of families 

0 

7 

1 

35 

2 

67 

3 

43 

5 

10 

Total 

162 (n) 


Tabulate the following scores of body weight (Kg) in a sample in the form of a frequency distribution : 

68, 59, 57, 64, 52, 60, 62, 57. 61 , ol. 1. M. 55. 54, 65, 67, 54, 62, 58, 60, 54, 62, 65, 71, 

63, 60, 61, 56, 67, 64, 57, 61, 60. 62, 59. 57. 64. 58. 61. 63, 62. 62, 60, 58, 67, 63, 64, 57, 

61, 60, 65, 67, 70, 58, 51, 61, 62. 65. 52. 60, 55. 63. 62, 60, 67, 55, 62, 61, 64, 59. 


Solution : 

Sample size (n) = 70. 

Highest score = 71. Lowest score = 51. Range = (71 - 51) + 1 = 21. 

ranoe 21 

Proposed size cf interval (i) = 3. Number of classes ( k ) =---= — =7. 

interval size 3 


Table 2.3. Tabulation of a continuous frequency distribution of body weight scores. 


Class intervals 

Score limits True limits 

Midpoints 

(X c ) 

Tallies 

Frequency 

(/) 

51-53 

50.5-53.5 

52 

llll 

4 

54-56 

53.5-56.5 

55 

!H1 !l 

7 

57-59 

56.5-59.5 

58 

mi mi ii 

12 

60-62 

59.5-62.5 

61 

mi mi mi mi mi 

25 

63-65 

62.5-65.5 

64 

mi mi hi 

13 

66-68 

65.5-68.5 

67 

mi i 

6 

69-71 

68.5-71.5 

70 

hi 

3 

Total 



— 

70 (n, 


3 


Scanned by CamScanner 

























18 STATISTICS IN BIOLOGY AND PSYCHOLOGY 

Body weight being a continuous variable, u continuous frequency distribution is tabulated in Table 2.3. 

(<•) The true lower and upper class limits ( X / and X fl ) of each interval arc calculated and entered. For example, 
for the interval 57-59, 

.Yj a to ((lower score limit of the interval) + (upper score limit of the next lower interval)] 

= to (57 + 56) = 56.5. 

•Y„ = to [(upper score limit of the interval) + (lower score limit of the next higher interval)] 

= to (59+ 60) = 59.5. 

(b) The midpoint X c of each interval is computed by the following formula: 

X r = to [(lower score limit) + (upper score limit)] 

For example, for the interval 57-59, 

X c = to(57 + 59) = 58. 

(c) Each individual score is entered as a tally against its class interval, each fifth score of a class being entered 
as a diagonal tally against the preceding four tallies in that interval. The total number of tallies in each class is 
then entered as the frequency (/) of that class. 


Example 2.1.2. 

Tabulate the following femur length scores (mm x 10~ 2 ) of a sample of aphids in a frequency distribution: 

31, 36, 37, 32, 43, 40, 34, 42, 36, 35, 39, 44, 37. 42, 44, 35, 45, 40, 31, 33, 

43. 34, 38, 39, 37, 35, 37, 36, 42, 49, 46, 38, 36, 31, 48, 37, 36, 37, 40, 38. 

Solution: 

Sample size (n) = 40. 

Highest score = 49. Lowest score = 31. Range = (49 - 31) + 1 = 19. 


Proposed interval size (/) = 3. Number of classes = 


range 19 

:- - = — = 6.33 = 7. 

interval size 3 


Table 2.4. Tabulation of a continuous frequency distribution of aphid femur length scores. 


Class intervals 


Score limits 


True limits 


Midpoints 

(X c ) 


Tallies 


Frequency 

if) 


31-33 

30.5-33.5 

32 

34-36 

33.5-36.5 

35 

37-39 

36.5-39.5 

38 

40-42 

39.5-42.5 

41 

43-45 

42.5-45.5 

44 

46-48 

45.5-48.5 

47 

49-51 

48.5-51.5 

50 


(HI IHI 
IN IHI 


5 

10 

11 

6 
5 
2 
1 



M Th ' ,rue l0Wer a,ld " piKr cl *“ limi,s <*l X.) of Och interval are worked out and entered in Table 2.4 


I 


Scanned by CamScanner 














PRESENTATION OF DATA 19 

For example, for the interval 40-42, 

X f c « i [(lower score limit of the interval) + (upper score limit of the next lower interval)] 

= V*(40 + 39) ■ 39.5. 

t 

X = H[(uppcr score limit of the interval) + (lower score limit of the next higher interval)] 

“ = W(42 + 43) = 42.5. 

(b) The midpoint (X f ) of each interval is worked out and entered. For example, for the interval 40-42, 

X c = Vi[(lower score limit) + (upper score limit)] 

' = Vi(40 + 42) = 41. 

(c) Each individual score is entered as a tally against its class interval, each fifth score of a class being entered 
as a diagonal tally across the preceding four tallies in that interval. The total number of tallies in each class is 
entered as the frequency (/) in that class. 

Example 2.1.3. 

Tabulate the following respiratory rate scores (per minute) of a sample of orang-utans in a frequency 
distribution: 

21, 16, 8, 12, 14, 10, 17, 17, 20, 13, 15, 19, 25, 16, 15, 23, 14, 18, 16, 17. 

Solution : 

Sample size («) = 20. 

Highest score = 25. Lowest score = 8. Range = (25 - 8) + 1 = 18. 

Proposed interval size (i) = 4. Number of classes = — -k --= — = 4.5 - 5. 

interval size 4 


Table 2.5. Tabulation of a discontinuous frequency distribution of orang-utan respiratory rate scores. 


Class intervals 
(Score limits) 

Midpoints 

(*c> 

Tallies 

Frequency 

</) 

8-11 

9.5 

II 

2 

12-15 

13.5 

(HI 1 

6 

16-19 

17.5 

INI III 

8 

20-23 

21.5 

III 

3 

24-27 

25.5 

1 

1 

Total 



20 (n) 


Respiratory rate being a discontinuous variable, the scores are arranged in a discrete frequency distribution 
(Table 2.5). 

(a) No true limit is worked out for any class interval because the variable is a discrete one. 


(b) The midpoint (X c ) of each interval is worked out and entered. For example, for the interval 16-19, 

X c = Vi [(lower score limit) + (upper score limit)] 

= ‘/2(J6+ 19)= 17.5. 

(c) Each individual score is entered as a tally against its class interval, each fifth score of a class being entered 
as a diagonal tally across the preceding four tallies in that interval. The total number of tallies in each class is 
entered as the frequency (/) in that class. 

3 


Scanned by CamScanner 












20 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


2.2 PIE DIAGRAM 

pie ding nuns arc used lor (lie graphical 
representation of frequency distributions of 
nominal variables. Tile diagram consists of a circle 
(pie) with its entire area representing the sample 
size or total frequency (n) of cases. The angle 360° 
at its centre is divided by a protractor to cut the 
circle into segments, each proportional in area to 
the relative frequency of eases in a class of the 
variable. The angle 0° to be cut off from 360° for 
a segment is given by : 0 = 360° x fin, where /is 


the frequency of cases in the relevant class and n 
is the sample size. The areas of different segments 
thus represent graphically the proportions of the 
net frequency (n) in the respective classes. But 
pie diagrams are inconvenient for comparing more 
than one sample because (i) separate pie diagrams 
have to be used for different samples, and (ii) if 
there are cither too many classes or too low 
frequencies in some classes, the segments for such 
classes tire too narrow for precise drawing and 
correct comparison. 


Example 2.2.1. 

Draw a pie diagram for die following frequency distribution of blood groups in a sample. 

Blood groups : O A B AB Total 

Frequencies : 258 172 387 43 860 

Solution: 

The entire 360° of the pie represents the total frequency ( n ) of 860. Where 0° is the angle of a segment and j 
is the frequency of cases in the class represented by that segment, 

0 = 360° x - ; 
n 

258 * Ml 

:. for O group, 0= 360° x — = 108° ; for A group, 0 = 360° x — = 72°; 

860 860 

for B group, 0= 360° x = 162° ; for AB group, 0 = 360° x — = 18° 

860 e r 860 

Using a protractor, segments of 108°, 72°, 162° and 18° are cut off successively from the circle to represent 

the frequencies of O, A, B and AB groups, respectively (Fig. 2.1). The segments are shaded or colored distinctly 
from each other and labelled. 


2.3 BAR DIAGRAM 

A bar diagram consists of set(s) of bars or 
columns, used for the graphic representation and 
comparison of the classwise frequency 
disti ibution(s) of a nominal or discrete variable 
in one or more samples. 

Simple bar diagram 

Hus consists of a set of several parallel bars 
or rectangles, one for each group or class of the 


variable. The bars may be vertical or horizontal, 
have equal widths chosen arbitrarily by the 
investigator, and are separated from each other 
by small intervening gaps indicating that real gaps 
exist between the classes of the variable. 
Frequencies, amounts, percentages, etc., are sealed 
parallel to the lengths of the bars, starting from a 
zero value to avoid any misleading impression 
about the relative lengths of the bars. The length 
oi height of each bar is made to correspond to the 


Scanned by CamScanner 














PRESENTATION OF DATA 


21 



Fig. 2.1. A pie diagram of frequency distribution of 
blood groups in a sample. 

Iiequency, amount or percentage in the relevant 
j class. Because the bars are of equal widths, their 
areas aie directly proportional to their respective 
lengths and consequently to the freque 
amounts or percentages in the respective classe s. 
|The distribution in different classes i 
by comparing the lengths (or areas ( 
respective bars (Fig. 2.2). 



Fig. 2.2. A simple bar diagram showing the frequency 
distribution of students of different subjects 
in a college. 

Multiple bar diagram 

Io show the frequency distributions in the 
groups of more than one sample, a multiple bar 
diagram is drawn with as many sets of bars as the 
number of samples. The bars of each set show the 
frequency distribution of a particular sample in 
the classes of the variable and are as many as the 
classes. The set of bars of each sample is separated 
from those of neighbouring samples by gaps 


00- 


■a 

E 


CO 

O 


u- 





1 

_ 

, ti " 

' C.' 

- 

' c ; 

' c : 

'!■ 

: e-1 

•CQ 

< 

1 -I- 

1 



2.3. A 


multiple bar diagram showing the distributions of Drosophila phenotypes i 


in three samples. 


Scanned by CamScanner 















































































































22 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


which are wider than (hose between (he bars of 
each set (Fig. 2.3). 

Proportional bar diagram 

It consists of a set of vert ical or horizontal bars 
cf equal widths, separated by intervening gaps, 
for the graphic comparison of the proportional 
distribution of frequencies, percentages, amounts, 
etc., in different classes or groups of several 
samples. It presents a more explicit comparative 
view of the class-wise frequency distribution in 
different samples titan the pie diagram. 

Each sample is assigned a bar, the entire area 
of which represents the total frequency ( n ) of that 
sample. The frequencies, percentages, amounts, 
etc., are scaled along an axis parallel to the lengths 
of the bars (Fig. 2.4). Each bar is divided into as 
many segments as the number of classes or groups. 
The lengths of the segments are determined by 
the frequencies, percentages or amounts in the 
relevant classes or groups. Consequently, their 
areas are proportional to the frequencies, 
percentages or amounts in the respective classes 
or groups. For easy comparison, the classes or 
groups are arranged in the same order and shaded 
or colored similarly in all the bars. 

2.4 FREQUENCY POLYGON 

The frequency distribution of a continuous 
variable is often presented graphically as a 
frequency polygon. It is drawn by plotting the 
frequencies of different class intervals against their 
respective midpoints, and is an area diagram of a 
continuous frequency distribution — the area 
enclosed by the polygon represents the sample size 
(«) while the jagged surface of the polygon shows 
visually how the frequency changes from class to 
class in the sample. 

(a) The scores of the variable are scaled along 
the X axis (abscissa) marking (he midpoints (X ) 
of class intervals along (hat axis ; (he X c scores of 
two additional class intervals with zero frequency 
are also entered on (he X axis, one for (he class 



Evaporative loss 


Fig. 2.4. A proportional bar diagram showing the loss 
of body water through different channels in 
three mammalian species as percentages of 
their total daily water loss. 

just below the lowest interval containing observed 
frequencies, and the other for the class interval 
just above the highest interval with observed i 
frequencies — this would enable the polygon to 
reach the base line or zero frequency level at both £ 
ends. For instance, in Example 2.4.1, the 
midpoints 108 and 58 have been included in the 
X axis for two additional empty class intervals, 
viz., 106-110 and 56-60 respectively, besides the 
midpoints for the class intervals containing the 
observed score frequencies (Table 2.6 and Fig. 
2.5). j 

(b) The frequencies are scaled along the Y axis 
(ordinate). The scales for scores and frequencies | 
should be so chosen that the ordinate at the peak 
of the polygon measures between 60-80% of its 
base. Scales for both axes should start from zero; 

if the lowest X c to be entered in the X axis is far 
higher than zero, the X axis may be breached or 
interrupted above its zero point by a zig-zag line 
to bring the lowest X c as well as the polygon closer 
to the Y axis (Fig. 2.5). 

(c) The frequency if) of each class interval of 
(he frequency distribution is next plotted against 


Scanned by CamScanner 


® 5 B' & ? 

























PRESENTATION OP DATA 


23 


the midpoint (X c ) of the corresponding interval, 
doing the same for the two additional empty 
intervals whose midpoints have been entered in 
the X axis. The plotted points arc then joined by 
straight lines to complete the polygon. 

Because more than one polygon may be 
overlapped or superimposed without confusion, 
frequency polygons arc very useful in comparing 
the frequency distributions of a variable in several 
samples. They also give a good visual idea about 
the contours of the distribution. But the uneven 
or jagged surface of the polygon fails to portray 
precisely the proportionate frequencies in the class 
intervals because the area of the polygon between 
the ordinates at the two limits of an interval is 
hardly proportional to the frequency in the latter. 

Smoothed frequency polygon : 

The smaller the sample, the more is the 
jaggedness of the polygon. To make the polygon 


for a small sample less jagged, the observed 
frequency (/„) of each class interval of a small 
sample needs a “smoothening”. 

(a) The “smoothed” frequency (f s ) of each 
class is the mean of the respective observed 
frequencies (f 0 ) of the relevant class and ol the 
classes immediately above and below the latter, 
f is worked out also for each ol the two additional 
class intervals with no observed frequencies. 

/ f = y y [tf 0 of the relevant class) 

+ (f a of the next lower class) 

+ (f o of the next higher class)] 

(b) The computed smoothed frequencies are 
plotted against the midpoints of the respective 
class intervals. TTtc plotted points are joined by 
straight lines to give the smoothed frequency 
polygon which, however, would not reach the X 
axis at its two ends (Fig. 2.5). 


Example 2.4.1. 

Draw the observed and smoothed frequency polygons of diastolic blood pressure (mm Hg) using the following 
diastolic BP scores in a sample. 

Class intervals : 61-65 66-70 71-75 76-80 81-85 86-90 91-95 96-100 101-105 

Frequencies : 2 7 12 23 40 22 15 


Solution : 

(a) While tabulating the data in a descending order of classes in Table 2.6, two additional class intervals with 
zero frequencies, viz., 106-110 and 56-60, are entered in the table, respectively above and below the highest and 
lowest intervals containing the observed frequencies. 

(b) The midpoint (X c ) of each class interval is then computed and entered in the table. For example, for the 
interval 71-75, 

. X1 71 + 75 __ 

X c = Vi [(lower score limit of interval) + (upper score limit)] - - - 73. 

(c) The observed frequencies (f 0 ) are scaled along the Y axis on a graph paper. 

(d) Scores are sealed along the X axis, marking the midpoints (X c ) of the class intervals on that axis. As the 
lowest X c of 58 is far higher than 0, a breach or gap is made on the X axis between 0 and the lowest X c (Fig. 2.5). 

(e) The observed frequencies (f 0 ), tabulated in Table 2.6, are plotted against the respective midpoints (X c ) on 
the graph paper and the plotted points are joined by straight lines to complete the frequency polygon with original 
observed frequencies (Fig. 2.5). 


Scanned by CamScanner 






STATISTICS IN BIOLOGY AND PSYCHOLOGY 


24 



i 


Tabic 2.6. Distributions of observ ed and smoothed frequencies of diastolic BP. 


Class intervals 

** 

fOi 

fs 


- " 

106-110 

108 

0 

>£(0*1+0) 

0.3 


101-105 

103 

1 

>£(l+S+0) 

3.0 


96-100 

98 

8 

>£(8+15+1) 

8.0 


91-95 

93 

15 

>£(15+22+$) 

15.0 


86-90 

88 

22 

>£(22+40+15) = 

25.7 


81-85 

83 

40 

>£(40+23+22) = 

2S.3 


76-80 

. 78 

23 

>£(23+12+40) = 

25.0 


71-75 

73 

12 

>£(12+7+23) 

14.0 


66-70 

68 

7 

>£(7+2+12) 

7.0 


61-65 

63 

2 

>£(2+0+7) 

3.0 


56-60 

58 

0 

>£(0f0+2) 

0.7 

- — 



Scanned by CamScanner 












PRESENTATION OF DATA 


25 


m Thc sm „o.h«I frequency (<,) is worked out for each class interval and entered in Table 2.6. For example, 
for the class interval 101-105, 

f = >£ \(f a of the relevant class) + (f u of the next lower class) + (f a of the next higher class')] 

= ^(1 + 8 + 0) = 3.0. 

Il) Each /, is plotted against the corresponding X, on a graph paper and the plotted points are joined by 
straight lines to give the smoothed frequency polygon (Fig. 2.5). 


Example 2.4.2. 

Draw the observed and smooth frequency polygons of femur lengths (mm x 10-’) using the follow,ng femur 
length data of a sample of aphids. 

Class intervals : 31-33 34-36 37-39 43-45 

Frequencies : 5 12 20 9 4 


Solution : 

M While tabulating the data ,n a descending order of classes ,n Tahlc : - I.tdj.t.onal ... with 

-*.’ i» Actable.respcc.cU . . W» the Ughe* 

lowest intervals containing observed frequencies. 


Table 2.7. Distributions of observed and smoothed frequencies of aphid femur lengths. 


Class intervals X c 


46-48 

47 

0 

^(0 + 4 + 0) 

1.3 

43-45 

44 

4 

>£(4 + 9 + 0) 

4.3 

40-42 

41 

9 

^(9 + 20 + 4) = 

11.0 

37-39 

38 

20 

>£(20+12 + 9) = 

13.7 

34-36 

35 

12 

>£(12 + 5 + 20) = 

12.3 

31-33 

32 

5 

>£(5 + 0 + 12) = 

5.7 

28-30 

29 

0 

>£(0 + 0 + 5) 

1.7 


(b) The midpoint (X r ) of each class interval is worked out and entered in the table. For example for the 
interval 37-39, 

w 37 + 39 

X c = '/2 [(lower score limit of interval) + (upper score limit)]- ^ -38. 

(c) The observed frequencies (f a ) are scaled along the Y axis on a graph paper. 

{d) Scores are scaled along the X axis, marking the midpoints (X r ) of the class intervals on that axis. As the 
lowest X c of 29 is far higher than 0, a breach or gap is made on the X axis between 0 and the lowest X r (Fig. _.6). 

(e) The f scores of Table 2.7 are plotted against the respective X c scores on the graph paper and the plotted 
points are joined by straight lines to complete the polygon with original observed frequencies (Fig. 2.6). 


Scanned by CamScanner 












26 


STATISTICS IN BIOLOGY 


and psychology 



Fig. 2.6. Original and smoothed frequency polygons of aphid femur length distribution. 

(f) The smooth frequency (f s ) is worked out for each class interval and entered in Table 2.7. For example, for 
the class interval 37-39, 

f s =/ 3 [(f 0 of the relevant class) + (f Q of the next lower class) + (f 0 of the next higher class)] 

= X(20+ 12 + 9)= 13.7. 

(g) Each f s score is plotted against the corresponding X c on a graph paper and the plotted points are joined by 
straight lines to give the smoothed frequency polygon (Fig. 2.6). 


2.5 HISTOGRAM 


Histogram or column diagram is a graphic 
representation of the frequency distribution of a 
continuous variable. It consists of a continuous 


set of bars, not separated by intervening spai 
and thus indicating the absence of any real gap 
the scale of the relevant variable. It is an ai 
diagram of a continuous frequency distribute 
its total area represents the sample size (n) wh 
the area of each bar is proportional to I 
frequency of cases in a particular class inters 
Unlike the frequency polygon where all the ca 


in a class interval are shown to be located at its 
midpoint, the frequency of each class is shown 
here as uniformly distributed over the entire 
interval length. The consequent absence of the 
jagged appearance makes it more convenient than 
the frequency polygon for a visual portrayal of 
the proportionate frequencies in the class intervals. 
The outline of upper surfaces of the bars gives 
also a visual idea of the shape of the frequency 
curve. But the histogram is less convenient than 
the polygon in comparing the frequency 
distributions of more than one sample because as 


Scanned by CamScanner 




PRESENTATION OF DATA 


27 



Fig. 2.7. Histogram of a frequency distribution of body 
weights. 

any separate histograms have to be used as the 
number of samples — they cannot be 
jperimposed on each other. 

(a) To avoid intervening gaps between 
jnsecutive class intervals, true class limits , 
imputed from the respective score limits of all 

the classes, have to be used in plotting the 
stogram. But no additional class interval with 
?ro frequency is included beyond the intervals 
iving observed frequencies. 

(b) The scores (X) of the variable are scaled 
long the X axis on a graph paper and the true 
lass limits of all the intervals of the frequency 
istribution are marked on the X axis. If the lower 
te limit of the lowest interval is far higher than 


0, the X axis may be breached or interrupted by a 
zig-zag line between 0 and that true limit — this 
would bring the histogram closer to the Y axis 
(Fig. 2.7). 

(c) The frequencies (/) are scaled along the Y 
axis. The scales for X and / should be so chosen 
that the ordinate for the highest/, i.e., the height 
of the tallest bar, measures between 60-80% of 
the base of the histogram. 

(d) Two ordinates are raised on the X axis at 
the true limits of each class interval, and the top 
end of the rectangle being formed is closed by a 
horizontal line at the level of the frequency (/) of 
that class interval indicated by the Y scale. A bar 
is thus formed with its base extending over the 
length of the class interval and its height 
corresponding to the frequency of cases in that 
interval. This is repeated for all the class intervals 
in the data to draw a set of bars. As long as the 
class intervals and so, the bases of the bars are of 
equal lengths, the areas of the bars are 
proportionate to their heights and so, to the 
frequencies in the respective intervals. 

For unequal-size class intervals, the bars differ 
in the width of their bases and their areas fail to 
indicate the proportional frequencies in the 
intervals. To remedy this, the frequency If) of each 
interval is divided by its class size (i) to give its 
frequency density (fli), i.e., the average frequency 
for unit length. Each bar is then drawn with its 
height equalling its fli and its base coinciding with 
the original class size (i) of that interval. The areas 
of such bars are proportionate to the frequencies 
in the respective unequal intervals (Fig. 2.8). 


ample 2.5.1. 

Draw the histogram of the following frequency distribution of body weights (kg) in a sample : 

Class intervals : 41-45 46-50 51-55 56-60 61-65 66-70 71-75 

Frequencies : 4 9 


17 


25 


15 


8 


Solution : 

(a) The data are tabulated in Table 2.8; the true limits (Xf and X u ) of the class intervals are computed and 


Scanned by CamScanner 



















28 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


entered. For example, for the interval 51-55, 

true limit = Vi [(upper score limit of an interval) + (lower score limit of the next higher interval)], 
lower true limit (X,) = */i (50 + 51) = 50.5 ; upper true limit (X u ) = Vi (55 + 56) = 55.5. 

(b) 1 he scores (AO arc scaled along the X axis on a graph paper, marking true limits of class intervals alone 
that axis (Fig. 2.7), while the frequencies (/) arc scaled along the Y axis. 

(c) Ordinates arc raised on the X axis at true limits of each class interval and the top end of the rectangle, tc 
be formed, is closed by a horizontal line at the level of the frequency (/) of that interval, read from the Y axis. Thi< 
is repeated for all the class intervals to complete the histogram (Fig. 2.7). 

Table 2.8. Frequency distribution of body weights in a sample. 

Class intervals Frequencies 


Score limits 

True limits 

(/) 

41-45 

40.5-45.5 

4 

46-50 

45.5-50.5 

9 

51-55 

50.5-55.5 

17 

56-60 

55.5-60.5 

25 

61-65 

60.5-65.5 

15 

66-70 

65.5-70.5 

8 

71-75 

70.5-75.5 

2 


Total 80 (n) 


Example 2.5.2. 

Draw a histogram of the following frequency distribution of body heights (cm) in a sample : 

Class intervals : 151-160 161-165 166-170 171-175 176-18. 

Frequencies : 5 25 20 10 4 


Solution : 

The class intervals of this frequency distribution are of unequal lengths. 

(a) The class intervals and observed frequencies (/) are tabulated in Table 2.9 and the true limits of the clar 
intervals are computed and entered as in Example 2.5.1. 

Table 2.9. Distribution of frequency densities in unequal class intervals._ 

_ Class intervals _ Size 

_ Score limits _ True limits _ of interval (/) 

151-160 150.5-160.5 10 

161-165 160.5-165.5 5 

166-170 165.5-170.5 5 

171-175 170.5-175.5 5 

176-183 175.5-183.5 8 


/ 


fii 


5 

25 

20 

10 

4 


0.5 

5 

4 

2 

0.5 


Scanned by CamScanner 











PRESENTATION OF DATA 


29 




Scores (X) 


Fig. 2.8. Histogram of a frequency distribution of 
heights with unequal class intervals. 


Fig. 2.9. Histogram of a frequency distribution of 
aphid femur lengths. 


I (H IV iicquertcy/of each clatt m. nirtUdividedbythebic I. . tateml get the frequency density 

Wi of the interval. For example, for the interval 151*160. y 

i = 10; /* 5; a fii = 5/10 = 0.5. 

t These frequency densities (fii) are entered in Table 2.9 against the respective class intervals, 
LtMMRg^r *" SC “ led a '° nf " K ' X “‘P^wer. maddng the mte Hunts of class intervals along 

(</) The frequency densities (fli) are scaled along the Y axis. 

fonned^s^closed' by^horizontal^hne^ auhe tevtl ofAe^^f^^^erv^readfr !*“, hV ^ reCtang,e bei " 8 
Tor all Ute class intervals to complete the histo^rl ffig. 2 8^ ™ s is "*•»» 


Example 2.5.3. 

Duw a histogram of the aphid femur length scores (mm x 10~ 2 ) of the 
Solution : 


data given in Example 2.4.2. 


|te c,ass zero frequency is entered beyond the 

L 3 ^' and *“> 0f each class i "“™' are computed and entered in Table 2.10, For example, for 


ttue limit = w [(upper score limit of 


an interval) + (lower score limit of the next higher interval,]; 


Scanned by CamScanner 





























3() STATIN Tim IN Hloi.ndY AND MYCHOI.OOY 

X , - v»n<> + 3 /) ■ lo,.1; X u m VA 10 i 40) - I'M, 

(c)Thc icoic.h (X) arc icnletl along the X axis on n urnph pupcir, marking ili«- mi# limit** of ciJhm lm**i vut h „< ( 
that axis, while the IVoquoncloi (/) me Healed along llio Y axis (I'lg, 2,9), 


(A) Oulinatos arc raised on the X axis at the true limits of each Interval and ili<- lop of the rectangle 
formed is closed by a horizontal line at (ho level of the/of that Interval, read from tin Y axis, Tins i«i(|, ( ;i | ri |,' 
all the intervals to complete the histogram (I'Ip,, 2.0), 

Table 2.10. Frequency distribution of aphid femur lengths In a sample (data from Example 2a l.2> 



Class intervals 

Frequencies 

j 

Score limits True limits 

v> 


31-33 

30.5-33.5 

5 


34-36 

33.5-36.5 

12 


37-39 

36.5-39.5 

20 


40-42 

39.5-42.5 

9 


43-45 

42.5-45.5 

4 


Total 


50 (n) 


2.6 FREQUENCY DISTRIBUTION CURVE 

Cumulative frequency 



With a very large sample, very short and equal 
class intervals, and scores expressed upto very fine 
fractions of a unit, the steplike or jagged outline 
of a histogram or a frequency polygon changes 
into a smooth curve. The graph of the frequency 
distribution thus becomes a frequency distribution 
curve, with its scores (X) and frequencies (/) scaled 
along the abscissa and the ordinate, respectively. 
The total area under the curve represents the 
sample size (n). The curve may be unimodal, 
bimodal or multimodal according as it has one, 
two or many peaks (Fig. 2.10). It may be either 
bilaterally symmetrical or asymmetric ( skewed) 
with one tail more drawn out than the other. It 
rnay be bell-shaped, J-shaped, U-shaped or reverse 
/-shaped. All these depend on the pattern of 
frequency distribution in the sample or the 
population, 

2.7 OGIVES 

Ogives are graphical forms of distributions of 
cumulative frequencies (cf) or cumulative 
percentages (cP) of the scores of a sample. 


The cumulative frequency is the sum of 
frequencies of all the scores of a contim 
variable in a sample, either upto a pi 
score ( less-than cf) or above a particular 
(more-than cf). However, the term cumulatn 
frequency or cf is generally used to mean the 
than type only, unless preceded by the w< 
“more-than”. 

In grouped data, the cf (less-than type) of d* 
Id h class interval is the sum of the frequencies (/ 
of all the k number of intervals from the lowe> 
one to the upper limit of the bh interval unde 

consideration. Where t/,, c/ 2 .. cf k are tb 

cumulative frequencies and /,,/ 2 ./* ^ ^ 

observed frequencies of the successive cla* 
intervals in an ascending order in a frequent 
distribution, 

cf) m f\ ; Cf 2 ■/, +f 2 ; cfy ■/, +/ 2 •*■/)» 
cfk m f\ + h T ••••• ■ n - 

Cumulative frequencies are used in lt: 
computation of percentiles, percentile rank* 
deciles, quartilcs and median of a sample- 


J 


Scanned by CamScanner 
















PRESENTATION OF DATA 


31 







Fig. 2.10. Different forms of frequency distribution curves. 


A cumulative frequency distribution is a 
tabulated form of cumulative frequencies 
according to the class intervals of the scores in 
the grouped data of a sample. For a cumulative 
frequency distribution (less-than type), the tnic 
upper limit ( X u ) of each class interval is computed 
and recorded, because the cf of an interval includes 
in such a case the frequency of that interval too 
(Table 2.11). 

Cumulative percentage 

The cumulative percentage ( cP ) of a class 
interval is its cumulative frequency (cf), upto its 
X u , expressed as a percentage of the total 
frequency (n) of the sample. 

cP= — x 100. 
n 

A cumulative percentage distribution is a 
tabulated form of cP scores according to the class 
intervals of the scores in the grouped data of a 
sample (Table 2.11). For a cP distribution, the true 
upper limit ( X u ) of each class interval is computed 
and recorded, because the cP of an interval takes 
into consideration the frequency of that interval 
also. 

Cumulative frequency ogive 

The cf ogive is the graphical representation of 


the cumulative frequencies in the class intervals 
of the sample. To plot it for the less-than type, the 
scores (X) are scaled along the X axis, marking 
the true upper limits ( X u ) of all the class intervals 
on that axis; in addition, the true lower limit (X^) 
of the lowest class interval is also marked on the 
X axis as the X u of the next lower interval with 
the cf of 0. The cf of each interval is plotted against 
its X u and the plotted points are joined by straight 
lines to give the cf ogive of the less-than type. 
(Fig. 2.11). 

Cumulative percentage ogive 

The cP ogive is the graphical representation 
of the cP scores in the class intervals of the 
sample. To plot it, the scores (X) are scaled 
along the X axis, marking on the latter the true 
upper limits (X u ) of all the class intervals, as 
well as the X, of the lowest interval as the X u of 
the next lower interval with the cP of 0. The cP 
of each interval is plotted against its X u and the 
plotted points are joined by straight lines to give 
the cP ogive (Fig. 2.12). cP ogives are used in 
the graphical methods for determining 
percentiles, percentile ranks, quartiles and 
median of a sample. 


Scanned by CamScanner 



















PSYCHOLOGY 


STATISTICS IN BIOLOGY AND 




Fig. 2.11. Cumulative frequency ogive for body Fig 2 .12. cP ogive for body weights of a sample. ioi 

weights of a sample. 


Example 2.7.L 

Draw the ogive and the cP ogive of the following distribution of body weights (kg) in a sample. 

51-53 54-56 57-59 60-62 63-65 66-68 69-71 

5 7 14 28 15 8 3 


Class intervals : 
Frequencies : 


a> 


Solution : 

1. The cfand cP distributions are first worked out as follows. 

(a ) The class intervals and their frequencies (/) are tabulated in Table 2.11. The empty class interval, immediately 
below the lowest interval containing observed frequencies, is also included in the table. 

(b) The true upper limits (X u ) of all the intervals arc computed and recorded. For example, for the interval 54 


56, 


X u = Vi [(upper score limit of interval) + (lower score limit of the next higher interval)] 

= *4(56 + 57) = 56.5. 

(c) The cumulative frequency (cf) upto the X u of each interval is then computed and entered in Table 2.U- 

c /*=/|+/2 + /3 + .+/* 

where cf k is the cf upto the true upper limit of the klh interval from the lowest one, and f v />. f k are 

observed frequencies of the respective class intervals. For example, for the interval 54-56, 

c /j ~f\ + /3 = 0 + 5 + 7 = 12. 

(d) Each cf is converted to the corresponding cP and the computed cPs are entered in Table 2.11. F 01 example 
for the interval, 57 - 59, 

cP = — x 100 = — x 100 = 32.5. 
n 80 


Scanned by CamScanner 

















PRESENTATION OP DATA 


33 


Table 2 . 11 . cf and cP distributions of body weights in a sample. 


Class intervals 


f 

cf 

cP 

48-50 

50.5 

0 

0 

0.0 

51-53 

53.5 

5 

5 

6.3 

54-56 

56.5 

7 

12 

15.0 

57-59 

59.5 

14 

26 

32.5 

60-62 

62.5 

28 

54 

67.5 

63-65 

65.5 

15 

69 

86.3 

66-68 

68.5 

8 

77 

96.3 

69-71 

71.5 

3 

80 (n) 

100.0 


2. The cf ogive is then drawn as follows : 

(a) The scores (X) are scaled along the X axis on a graph paper, marking the X lt of each class interval on that 
axis (Fig. 2.11). 

( b) The cumulative frequencies (cf) are scaled along the Y axis. 

(c) Each computed cf is plotted against the X u of the corresponding interval, and all the plotted points are 
joined by straight lines to give the cf ogive. 

3. The cP ogive is next drawn as follows : 

(a) The X scores are scaled along the X axis of a graph paper, marking the X u of each class interval on that 
axis (Fig. 2.12). 

(b) The cumulative percentages ( cP ) are scaled along the Y axis. 

(c) Each computed cP is plotted against the X u of the corresponding interval and all the plotted points are 
joined by straight lines to give the cP ogive. 


2.8 SCATTERGRAM 

A combined frequency distribution of two 
variables is called a bivariate frequency 
distribution. One form of its graphical represen¬ 
tation is the scattergram. 


For a scattergram, the scores (X and Y) of two 
measurement variables are scaled along the 
abscissa and the ordinate, respectively. The Y 
score of each individual of the sample is then 
plotted against the X score of the same individual. 


JO 

5 

u 

09 

> 


* # •• :» 
' 

9 


• * l ’ 
• • • * 


*. • •• • • *• 

*#•*#«*••••• •* 

* * - • , • 

• * » > r 

,* . » * * 


• • • 

• r » •'% • - •. ,# 
0 9 » 4 9 • • 

§ m m 

• V * 


. • ;;,v ." ■ 

• ' . • • ♦.» i,' * V' * 

.•s. ••••••••. 


x> 

’C 

C9 

> 


• • * • • • • 
v . > • •• \ 


» . *.«*%•• •/!••• jo 
•• • • • • • 

• • #. • / • 

'• ••••« » 

.. • * • • 

• * * * • / ' 

* . , • 


.2 

5 

> 


♦ •• • 


••••••/• , • * 

» * * • i \ *»« • 

• • 


. «i 11 
• • • • • 

• • 

• % •. • ♦. 

• «%*. • • 


• • 


• • • • 


* \v 

iV* * • • • • • • • 

• * '***.» . 
• • • • • • • 


X variable X variable 

Fig. 2.13. Scattergrams of different types. 


X variable 






Scanned by CamScanner 
















34 


STATISTICS IN 


biology and psychology 


slope-intercept form 


Each individual is thus plotted as a poim located - . y = „ ♦ tX, 

by his scores » f f"''^“^scatter diagram whcre Y mi Xare the variables scaled respective,, 
of all such points forms (he ordinate (K-axts) and the absci ss 

'^escatte^ram may indicate the form of (X . axis) , a is the intercept of the line with , h 

i • i •« hratween (he two measurement y-axis, and b 
relationship be««« io „ of Ulc plotted int on the line is given or expressed by ,h 

noints mav^ndicate^te absence of any relationship J ordinates or perpendicular distances of th, 
between the variables. An elliptical or hyperbolic point from respectively the X- and K-axes. Th, 
distribution of the scattergram indicates the s , ope .intercept equation chang o th, 

possibility of a nonlinear association between the folIowing f or m where the straight line pass, 
variables. If the plotted points are distributed t h e origin of the two axes and so, throiij 

around a straight line with a downward or upward ^ zer0 va i ue 0 f y-intercept (i.e., a- 0). Y-k 
slope, the scattergram mdicates a negative ^ ^ ^ . g the y-intercept of the line in in 

slope-intercept form and its magnitude varie 
opposite to that of the slope of the line. Th, 
magnitude as well as the algebraic sign of the tern 
a indicates the general level of the line on th, 
. . y-axis. It indicates the expected, estimated 01 
ariable or event has some form of association extrapolate( i magnitude of the variable Y for suet 


or positive linear association between the 
variables. 

2.9 LINEAR GRAPH 

Where the magnitude (value or score) of one 


} 


with that of another, their magnitudes are 
dependent on those of each other, and each such 
variable or event may, therefore, be considered to 
be a function of the other. Frequently, two 
variables or events (X and Y) may have such 
association with each other that plotting of their 


a case as might have the zero magnitude of th< 
variable X. A negative value of the term a signifk 
that the line has its F-intercept located below the 
zero point of the ordinate scale. 

The term b in the slope-intercept equation i; 


scores against each other yields a distribution of the measure of slope of the line — the magnitud; 
the plotted points on or closely around a straight and the algebraic sign of the term b indicate 
line. Such an association between the two paired respectively the steepness ( gradient ) and th> 
variables or events is called a linear association direction of the slope. The term b gives a measun 
and each of them may be considered to be a linear of the average rate of change of the scores o 
function of the other. Such linear association variable Y, scaled along the y-axis, for unit chan, 
between two variables or events is characterized of the scores of X along the abscissa. The hi w 
by uniform changes in the scores of one variable the magnitude of b , the stronger is the lin 
(say, Y) accompanying unit changes in the scores association between the two variables, the greats 
of the other variable (X) and vice versa; in other is the average rate of change of Y with unit chang* 
words, the rate of changes of Y scores does not of X , consequently the steeper is the gradient 
vary with successive changes in X scores and the line, and the lower is the y-intercept or termo 
vice versa, thus causing the plotted points for their A low value of b shows a weaker associate 
respective scores to lie on or close to a straight betwdfcn the variables, a lower or flatter 
line. Linear associations between two variables of the line, and a higher magnitude of the 
or events may thus be presented in die form of a intercept of the line. The algebraic sign of term 
straight line or linear graph, the equation for indicates the relation between directions o 1 
which is commonly expressed in the following changes of die two variables. A positive b signify 


Scanned by CamScanner 








PRESENTATION OF DATA 


35 


that high values of one variable are generally 
accompanied by high values of the other too, while 
low values of the former are associated with low 
values of the latter; in such a case, the line has an 
ascending gradient. On the contrary, a negative b 
shows that high and low values of one variable 
are generally associated with respectively low and 
high values of the other variable; thus, the line 
has a descending gradient and its slope-intercept 
equation is give by Y = a - bX. 

Often the points, plotted with the observed 
paired scores of two variables in a sample, are so 
scattered that they cannot be joined directly to 
form a straight line. In such cases, the best-fitting 
straight line is so drawn as to keep the sum of 
squared vertical distances of the plotted points 
from that line at the lowest. This ensures the 
minimum total of the squared differences between 
the plotted points for the paired observations and 
the respective points for corresponding estimated 
values. 

Examples of some linear plots arc cited below. 
Mention should be made in this context that to 
gain greater precision in interpretations, extra¬ 
polations and estimations, equations for nonlinear 
associations are frequently transformed into linear 
equations for presentation as straight lines, such 
as the exponential relationships of radioactive 
decay with time and of vapour pressure with 
temperature, the sigmoid associations of initial 
velocities of allosteric enzyme actions with 
substrate concentrations and of oxygen saturation 
of hemoglobin with 0 2 tension, and the 
rectangular hyperbolic relations between initial 
velocities of enzymes and substrate concen¬ 
trations. 

(a) The diffiisional flux (7) of a solute has a 
positive linear association with its transmembrane 
concentration gradient (An). The linear plot of the 
diffusion rate (7) against the difference (An) in 
solute concentration across the membrane 
conforms to the following linear equation (Fig. 
2.14): 

J=PAn , or, 7 = (0+) PAn, 



Fig. 2.14. Linear plot of diffiisional flux (7) against 
transmembrane solute concentration (A n). 

where P is the permeability coefficient of the 
membrane for the solute in cm/sec for unit 
concentration difference and constitutes the slope 
of the line, while the F-intercept of the latter 
coincides with the zero value of the diffusion rate 
on the ordinate scale. 

( b) Lineweaver-Burk double-reciprocal 
equation is a linear transformation of the 
Michaelis-Mentcn rectangular hyperbolic 
equation for enzyme kinetics and is derived from 
the reciprocal of the latter equation. It expresses 
the reciprocal (1/V Q ) of the initial velocity of 
enzyme action as a linear function of the reciprocal 
(1/[SJ) of the molar concentration of the substrate, 
where is the maximum velocity of enzyme 
action and 1^ is the Michaelis constant or the 
substrate concentration for attaining j-V (Fig. 
2.15). 



Scanned by CamScanner 











36 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


\ ' V ™« ' V ».» |Si ' 


The straight line conforming to this equation 
has K m /V m „ ns its slope, l/V„, nx ns its K-mtercept, 
and -l/K m as its negative ^-intercept. 

(r) Wolf-Hones equation is another linear 
transformation of the Michaclis-Mcnlen 
hyperbolic equation, and is derived by multiplying 
both sides of the double-reciprocal equation by 

[S]. It expresses the [S]/V Q ratio as a linear 
function of [S] (Fig. 2.16). 


+1 


T -i 



A 

-. -y 

/....\ Slope rs n 

/ 

.log [S 50 l 

1 

y± i_i 

-4 

-3 -2 -1 


log [SJ -► 


•LSI = 


K 


■ m - + [S]x 


max 


max 


The straight line conforming to this equation 
has 1 ^ V max as its slo P e > K n/ V max as its Y ~ 


intercept, and -K^ as its negative X-intercept. 

(d) Hill equation for the sigmoid kinetics of 
allosteric enzymes is logarithmically trans¬ 
formed into the following equation expressing 

log t V (/( V max - v o)] as a linear function of 
log [S]. Where IC is a constant. 


Fig. 2.17. Linear logarithmic plot of Hill equation. 

between ligands binding to the enzyme. The 
T-intercept of the line is - log K' (Fig. 2.17). 

(e) The positive exponential relation betwee 
the vapour pressure ( p ) of a liquid and its absolu 
temperature (7) may be transformed logarii 
mically into the following linear equati 
expressing log p as a negative linear function 
1/T. 


log p = C- 


2.303 R X T' 


log 


V - V A 
L max 0 


= n log [S] - log K'. 


The straight line resulting from this equation 
has as ns slope the Hill coefficient (n) which 
indicates the type and magnitude of cooperativity 


where C is a constant depending on the liquid,.. 
is the molar gas constant, and Ly is the molar hea 








F ‘ g ' 2 ' 18 - L i nellr relationship between the log 
O vapour pressure <p) and the reci 
° absolute temperature (T). 


Scanned by CamScanner 

































PRESENTATION OF DATA 


37 



Fig. 2.19. Linear relationship between log (N/N 0 ) 
and time t. 


of vaporization absorbed in vaporizing 1 mole of 
the liquid. The plotted line hat a descending slope 

of -Ly/2.303R and the /-intercept equalling C 
(Fig. 2.18). 


(/) The exponential equation for the 
radioactive decay of the nuclei of a specific 
radioisotope in a time interval ( t) is transformed 
into the following equation for a straight line with 
a descending slope of - A/2.303 where A is the 
decay constant of that radioisotope while N 0 and 
N are the numbers of radioisotope nuclei existing 
respectively at the commencement and the 
termination of the interval t (Fig. 2.19). 



—*— i a 0- - —f 

2.303' u 1303 r ‘ 


(g) The linear equation of association between 
two variables can be used in working out the linear 
regression for predicting the likely score of either 
of them from the observed score of the other in 
an individual (see §8.12). 


2.10 EXPONENTIAL CURVE 

Where a process or event consists of changes 
of a single variable and its rate or velocity (V) 
depends on the amount or concentration [A] of 
that variable alone, it is called a first-order reaction 
with k as its rate constant. 

V = Jc[A]. 


The rate of change of the variable in a first- 
order reaction is directly proportional to the initial 
concentration or value of that variable alone. Thus, 
in each unit time, the value of the variable changes 
by a constant fraction of its initially existing value. 

In such events, the half-life (fo^oh which is the 
time taken by the value of the variable to change 
by half its initial amount, is a constant. This makes 
the variable or reactant to change exponentially 
with time; in other words, an almost infinitely long 
time interval would be required in attaining the 
change of the total amount of the initially existing 
variable. An equation of such a change expresses 
the changing variable (/) as a function of an 
exponent or power (X) of the base (e) of natural 
logarithm and assumes the following alternative 
forms, according as the exponent bears 
respectively positive and negative algebraic signs 
and depending on the constants Z and k. 

Where the exponent (X) bears a positive 
algebraic sign and the plotting of / and X scores 
against each other yields an ascending exponential 
curve with a progressively steeper slope as X 
increases, 

Y = Ze**. 

This applies to events complying with the law 
of continuous growth, having progressively 
increasing rates of changes of the variable / with 
the rise in the exponent X. The standard form of 
this equation used generally is : Y = e x . 

But in case of events suffering from continuous 
decline in the rate of change of / with rise of 
exponent X, the latter bears a negative algebraic 
sign and the plot of / against X yields a descending 
exponential curve with progressive decline in its 
slope as X rises. Such events conform to the law 
of continuous decay and follow the following 
exponential equation : 

Y = Ze~ kX , 

or its standard form : / = e~ x . An example of a 
descending exponential curve is that for the 
continuous decline in the relative activity of a 


Scanned by CamScanner 











38 


STATISTICS IN MM 


0(IY AND PSYCHOLOGY 


,i mc This constitutes » fl«t- 
radioisotope with tin • . proportional 

re «Hon »' > '^"be. (W„) of 

r’clivcnm” dmngcs cxponcnlinlly with 

radioactive mic ic « Wlicn A/ is 

nmc cox'P'V■"* ®!^’ jvc miclci | c f, nllcr a lime 
the number i ^ dccliy constant consisting 
rfSetion of radioactive nuclei which decays 
in each unit time interval, 


i.v N _ j,— A/ 

y = e 4 *, or -rp - <• • 

/v 0 

So. plotting of £■ against r gives a negative 

exponential curSe with its slope getting 
progressively less steep with time (Fig. 2.20). 

On the contrary, the continuous rise in bacterial 
growth rate in a bacterial culture with time can be 
expressed as an ascending exponential curve (Fig. 
2.21). This consumes a first-order reaction with 
its rate directly proportional to the existing 
bacterial count, so as to rise exponentially with 
dme in compliance with the law of continuous 
growth and following the standard equation : 
Y ~ e x . The curve has an upward slope whose 



Bg 2 SnS XT*** “uv'.y of radio- 

i&otope with time. 



Fig 2.21. Ascending exponential curve (Y ~ e*) of 
rise in bacterial count in bacterial cultures 

with time. 


gradient becomes progressively steeper with the 
length of time interval (X = 0 and the T-intercept 
is at the value 1 of the ordinate scale. 


Exponential equations may be logarithmically 
asformed into linear equations and presented 
straight line graphs on natural or ordinary graph 
jers because logarithms of numbers of any 
^metric series constitute an arithmetic series 
e §2.9). But instead of such linear forms 
lined by using log Y. the score Y itself of an 
ronentiai series may be used directly .n drawing 


2.11 semi-logarithmic graph 

A natural or ordinary graph paper has both its 
vertical (ordinale) and horizontal (abscissa) axes 
scaled arithmetically. So. successive rulings a re 
equidistant from each other in both horizontal and 
vertical directions on it, and both axes may start 
from zero values of their respective scales. On a 
semi-logarithmic graph paper, although the 
abscissa (X-axis) is scaled arithmetically starting 
from the zero point and successive vertical ruling 
are equidistant from each other, the ord inate 



Scanned by CamScanner 






PRESENTATION OF DATA 


39 


, y ^is) is scaled logarithmically from a non-zero 
nein such as 1, 10 and 100, its scale shows 
proportionate changes instead of absolute 
changes, and successive horizontal rulings are 
located at varying distances from each other. This 
is because the ^-axis on such graph paper is 
subdivided longitudinally in terms of geometric 
0 r logarithmic series, and the distance of each 
horizontal ruling from the abscissa is proportional 
to the logarithm of the number it represents on 
the ordinate scale. On the countrary, a double- 
logarithmic graph paper has both the absci*sa 
(X-axis) and the ordinate (/-axis) subdivided in 
terms of some geometric or logarithmic scries so 
that neither horizontal nor vertical ruling! are 
equidistant, the scale of neither axis starts from 
zero, and both scales show proportionate changes 
instead of absolute changes. 

Where a variable Y has an exponential 
relationship with the power X of a base (say, e of 
natural logarithm) in terms of an equation such as 
Y = the Y and X scores may be scaled along 
die ordinate and die abscissa respectively, and each 
observed Y score — instead of log Y — may be 
directly plotted against the corresponding value 


of X, thus yielding a linear semi-logarithmic graph 
with the plotted points. Such a graph is also known 
as the ratio chart, because the Y values form a 
geometric series having a constant ratio between 
successive scores or changing by a constant 
proportion. If the plotted points fall on the line or 
occur close around the latter, a constant rate 
prevails for the changes of Y variable; dispersions 
of the points away from the line indicate wider 
deviations from a constant rate. 

Variations in the slope of the semi-logarithmic 
graph signify alterations in the rate of changes of 

Y variable. Rates of changes of two or more series 
of scores in geometric or exponential progressions 
may be compared using their respective semi-log 
graphs; scmi-logarithmic lines would have parallel 
slopes if the corresponding scries of scores have 
an identical rate of changes while lines with 
different slopes would indicate differences 
between different scries in their rates. 

Semi-log graphs can compare also two or more 
geometric or exponential scries of scores in 
different units. They are also needed when the 

Y scores constitute a series with very wide range 
of values. 


GLOSSARY 

bar diagram: a set of parallel equal-width ban, separated from each other and having their areas proportional 
to the frequencies of cases or the amounts in different classes of a rhsconunuous or nommal vanable tn 

the sample. 

. ,. ... i r.a,ri»rrai nf narallel separated and equal-width bars for as many samples, each set 

bar nr^ttg^^o.CielL.uencies o^ses or the amounts in Afferent classes o, a 

discontinuous or nominal variable in one of the samples, 
bar diagram, proportional : a set of parallel and separated ban of identical widths and lengths, each bar 
divided fnto several lengthwise segments with their lengrhs proportional to the frequences of cases or the 
amounts in Afferent classes of a variable in one of the several samples. 

bar diagram, simple : a single set of parallel, separated and equal-width bars with their lengths proportional to 
the frequencies of cases or the amounts in different classes of a Asconttnuotrs or nommal vanable tn a 

single sample. 

exponential equation: an equation expressing the values of a variable as a function of the power of a base such 
as the base of natural logarithm. 

frequency : the number of repeated occurrences of a score or case in the sample. 


Scanned by CamScanner 




40 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


frequency, cumiilntivc : the total frequency of scores or cases In nil the class intervals from one end of the 
frequency distribution of a sample upto the lower or upper true limit of a given Interval. 

frequency distribution : a distribution of frequencies of cases or scores of a sample in dillerent class 
intervals of a variable. 

frequency polygon : a polygon drawn as a graphical representation ol the frequency distribution <>l a 
continuous variable by plotting the frequencies of scores in different classes of the variable against 
the midpoints of the respective classes. 

frequency table, simple : an ungrouped frequency distribution of a disucte va,l *' hlc ; ''J 

frequencies of different scores of a sample are entered against the respective individual scores. 

in class intervals consisting of the respective ranges of scores. 

extending between the true class limits of a class interval, 
linear equation : an equation expressing the values of a variable as a linear function ol another. 

° give sl 

percentage, cuntu.ative : cumulative frequency of any Cass interval expressed as a percen.ag 

sample size. s haying their respecliv e areas proportional to the relative 

Pi£ ~ «» ° f “ ZloZt score, of two pat,ion,ar vatiabies of 

o*er ,0 stady the form of association between 

those variables. . . sca | e(1 arillimeiically making the vertical 

-*^ 5 ^ . . . . ... . 

between successive horizontal ruirngs. 


Scanned by CamScanner 


3. STATISTICS OF LOCATION 


Statistics of location belong to the class of 

cript ive statistics (page 11). They sene to 
1 cate specific positions of the frequency 
*° jjfoution of a variable in a sample on the scale 
^scores of that variable. They include mean, 
Median, mode, percentiles, deciles and quartilcs. 

3.1 CLASSIFICATION 

Statistics of location are classified further into 
two main classes. 

(a) Measures for central values: 

They include mean, geometric mean, median 
and mode, and are also called central tendencies. 
They describe the locations of specific central 
positions of the frequency distribution of a variable 
in a sample on the scale of that variable. 

(b) Quantiles or fractiles : 

They are the scores below which lie specific 
fractions of the frequency distribution of a variable 
in a sample. They thus partition out specified 
fractions like specific numbers of quarters, one- 
tenths and one-hundredths ot the distribution. 
They are also called partition values and include 
quartiles, deciles and percentiles. Median is both 
a measure of central value and a partition value. 

3.2 MEAN 

Mean is the arithmetic average of a set of 
scores. The mean of a sample (statistical mean) 
and that of a population (parametric mean) are 
represented by the symbols X and //, respectively. 
Where X (or X-) represents each individual score 
of a sample, IX (or IX,.) is the sum of all its scores, 
and n is the sample size or the total frequency of 
cases in the sample, 


This is also how the mean is worked out for a 
small sample, with its scores few in number and 
not arranged in a frequency distribution or a 
frequency table. Properties of mean are as follows: 

(a) The sum of all the scores of a sample is 
given by the (deduct of their mean and the sample 
size. 

X.Y = nX; or, XX, = »X . 

(b) Mean would be the score of each individual 
if the total score of the sample (IX) were equally 
distributed among all its individuals. 

(c) The sum of positive deviations of some of 
the scores from the mean equals that ot negative 
deviations of the remaining scores of the sample 
from it. So, the algebraic sum of the deviations of 
all the individual scores from the mean amounts 
to zero in any sample. 

X(X- X ) = 0; or.Z(X i -X) = 0. 

(d) The sum of squares about the mean, i.e., 
I(X - X ) 2 or I(X, - X ) 2 , is the sum of squared 
deviations of all the scores of a sample from its 
mean and is the lowest of all such sums of squares 
about the respective measures for central values. 

(e) If the individual scores of a sample are 
all multiplied or divided by a constant number 
(say, it), the mean also gets respectively multiplied 
or divided by the same number. 

l(kX) _ k -. XX = £ 
n kn k 

(/) If a constant number k is added to or 
subtracted from each score of a sample, its mean 
also gets respectively increased or decreased by 
the same number. 

M±*l = X + Jt; = X-k. 


6 


41 


Scanned by CamScanner 







42 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


1 


(£> in a bilaterally symmetrical and unimodal 
distribution, with a single peak and neither ol its 
tails longer or more tapering than the other, the 
mean is the exacdy central score and identical with 
the median and the mode. For such a distribution 
of scores in a sample, mean is the most reliable, 
stable and widely applicable central value. 

(i h ) The presence of a score, with an extreme 
positive or negative deviation from the mean and 
not counterbalanced by the presence of another 
score with an equal but opposite deviation, makes 
the distribution asymmetric, displaces the mean 
towards that extreme score, and causes the mean 
to differ from both the median and the mode of 

the distribution. In such cases, X > Mdn > Mo , if 
the unbalanced extreme score/scores is/are in the 
high-value (positive) tail of the distribution to 
make that tail longer; on the contrary. X < Mdn < 
Mo, if such extreme scores occur in the low-value 
(negative) tail making the latter longer This 
implies that the mean is unreliable as a centra) 
value in an asymmehic dis b i Union whichhatoae 
tail longer than the otlici due to a few scores with 
larger deviations in the longer tail. 


(i) If the scores (K) of a variable are the Ii r , r ^ 
functions of the scores (X) of another Viuu,t,| r 
then the mean Y of the former is also a 

function of the mean X of the latter. Tlius, if „ j 
the vertical intercept and h is the slope of 
straight line formed by plotting the Y scores ngai„ M 
the X scores of the respective individuals i n , 
sample, 

Y= a + bX, and 
7 = u + bX. 

Computation from frequency tables 

In simple frequency tables with frequencies 
entered against single distinct scores, each forming 
a class by itself (Table 3.1), one or more 
individuals possess identical scores so that the 
scores are repeated in the data ; but the data arc 
not classified into groups. 'Ihc mean is computed 
here from the frequencies (repetitions) of the 

individual scores. Where /,. f 2 . f k arc the 

frequencies (/j) of the respective individual scores 

(Xj) like X,. X 2 . X k , and n is the total frequency 

or sample size, 

7. IMl _ /i*i tMi t . ±Mk 

n n 


Example 3.2.1. 

Compute the mean of the following interorbital width scores (mm) of a sample of pigeons. 
12.8, 11.7, 12.3, 10.8, 12.5, 11.4, 12.5. 10.9, 11.6, 11.7. 

Solution : 

X - - 1 118+1 1- 7 + 12.3+10.8 + 125 + 11.4 + 12.5 + 10.9+11.6 + 11.7 

n IQ — 11.8 mm. 


Example 3.2.2. 

Compute the mean of following body weight scores (kg) in a sample of humans 
66, 50, 56, 63, 68, 60, 60, 68, 72, 66. 68, 63, 60, 66, 75, 56, 72, 63. 

Solution : 

in a simple frequency^a^CTaWe 3~1 ). ^ ^ ^ 3 ** SC ° reS occurrin 8 repeatedly in it, the data are arranf* 


17_I/X_1152 _ 

x--—— = 64.0 

n 18 


kg- 


Scanned by CamScanner 








STATISTICS OF LOCATION 


43 


Table 3.1. Frequency table of body weight scores for computation of mean. 


Frequencies (/) 


A 


50 

1 

50 

56 

2 

112 

60 

3 

ISO 

63 

3 

189 

66 

3 

198 

68 

3 

204 

72 

2 

144 

75 

1 

75 

Total 

18 00 

1152 (IfX) 


Computation from grouped data 

This method is used for directly computing the 
mean of a continuous measurement variable 
whose scores have been arranged into a regular 
frequency distribution. 

The entire range of the observed scores is 
divided into class intervals, preferably of equal 
sizes, and the frequency of scores falling in each 
interval is entered against the latter to form a 
frequency distribution (sec pages 15-16). Because 
the score of each observ ation in a class interval is 
assumed to be identical with the midpoint (X^ of 
that interval, the sum of the scores of each interval 
is obtained as fX c by multiplying the frequency 
(/) of that interval with its midpoint (X c ). Thus, 


the sum of the fX c values of all the intervals gives 
the net sum of all the scores of the sample. Hence, 

— sum of all scores £,/Xr 

y 2 i — — ■ — s • 

sample size » 

Minor discrepancies may arise in the mean 
computed in this way if the scores arc grouped in 
a different set of class intervals. Moreover, the 
mean computed by using the midpoints of 
intervals may differ slightly from that computed 
directly from individual scores of ungrouped data. 

The mean cannot be computed in this way if 
the data have been arranged in an incomplete 
distribution with open class intervals (page 16) 
because the midpoint is not available for such an 
interval. 


Example 3.2.3. 

Compute the mean body weight from the following frequency distribution of body weights (kg) in a sample 
of humans. 

Class intervals : 51-53 54-56 57-59 60-62 63-65 66-68 69-71 

Frequencies : 4 7 12 25 13 6 3 

Solution : 

(0 The data are arranged in Table 3.2, and the midpoint (X c ) of each class interval is computed and entered in 
the table. For example, for the interval 54-56. 

X c = Vi[(upper score limit) + Gower score limit)] 

or, X c = W(56 + 54) = 55. 


Scanned by CamScanner 











44 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Table 3.2. Table for computing the mean body weight from grouped data. 


Class intervals 


/ 



51-53 

52 

4 

20S 


54-56 

55 

7 

3S5 


57-59 

58 

12 

696 


60-62 

61 

25 

1525 


63-65 

64 

13 

832 


66-68 

67 

6 

402 


69-71 

70 

3 

210 


Total 


70(n) 

425S 



00 Each X c is multiplied by the frequency (/) of cases in that interval to work out the fiC c of the latter. F« 
example, for the interval 57-59, fX e = 12 x 58 = 696. 

(i*7) The sum of the fX c values of all the intervals and the total frequency (rt) of the sample arc used to 
compute the mean X . 

lfX< -*158 

X — .60.8 kg. 

n 70 * 


Example 3.2.4. 

Compute the mean for the wing lengths (mm) of houseflies given below : 


3.9,4.3,4.8.4.7,4.6, 
3.8,4.5, 5.0,4.9,4.8, 


4.4.3.7.4.14.1.4.8.5.3.4.9.4.6. 3.8.4.0. 5.3.5.7,5.5,3.9,4.5. 
3.5.43,5.1. 3.9.4.7.5.6.4.6.4.4, 3.4.5.1,4.6, 3.9,3.8,4.8,4.9. 


Solution: 

0) The data are first arranged into a frequency distribution and entered in Table 3.3. 

Highest score = 5.7. Lowest score = 3.4. Range = 3.4 to 5.7. 

Sample size (n) = 40. Number of intervals chosen = 5. 

Size of class intervals (Q [(5.7 - 3.4) + 0.1] -r 5 = 0.5. 


00 The midpoint X c is computed for each class interval. For example, for the interval 3.9 - 4.3, 

X e = $6(4.3 + 3.9) = 4.1. 

(ill) Each X c is multiplied by the frequency (/) of that interval to give/X^ For the interval 3.9 - 4.3. n* 
example. 


fX e - 9x4.1 = 36.9. 

(iv) The sum of the/X c values of all intervals and the total frequency- (n) of the sample are used in compute 
the mean (X ). 


- z/X c _ 180.5 
n 40 


= 4.51 mm. 


Scanned by CamScanner 












STATISTICS OF LOCATIOxN 


45 


Table 3.3. Table for computing the mean winglength from grouped data. 


Class intervals 


/ 

f*c 

3.4-3.8 

3.6 

6 

21.6 

3.9-4.3 

4.1 

9 

36.9 

4.4-4.8 

4.6 

14 

64.4 

4.9-5.3 

5.1 

8 

40.8 

5.4-5.8 

5.6 

3 

16.8 

Total 


40 

180.5 

Computation of weighted mean 

using the group sizes («p n 2 , etc -) as vve '8^ ts 



for the respective group means. 

The means (X,, X 2 , etc.) of a 

given variable in 



k number of groups or samples may be used to 

— /i| AT| + /hXj +.4* iifr\ b 

compute the weighted mean (X ) of the full set. 


n |+«2 +. +n k 

Example 3.2.5. 




The mean systolic blood pressure was found to be 

'. 2S> 4 and 133.6 mm Hg for two groups of 12 and 15 


humans, respectively. Find the mean systolic blood pressure of all the 27 men. 
Solution : 


X, = 129.4; n, = 12. X, = 133.6; n 2 = 15. 


X = 


mXi+ifrX; _ 12x129.4 + 15x133.6 
R| + n 2 12 +15 


131.7 mm Hg. 


Example 3.2.6. 

20% of a group of 80 men and 15% of a group of 120 women were found to be diabetic. Find the mean 
percentage of diabetics for both the groups combined. 

Solution: 

For men : /i! = 80; percentage (P,) = 20. For women: /^ = 120 ; P 2 = 15. 

— Ji]P| "F n 2^2 _ * 2 0 +120 x 15 t 

" Ap= n, + ni ~ ” 80 + 120 “ 


3.3 GEOMETRIC MEAN 

Geometric mean ( GM) is the nth root of the 
product of all scores (X) of a sample when n is 
the total number of scores and none of the scores 
is negative or 0. It is thus the antilog of the mean 
of the logarithms of scores, provided all of the 
scores are higher than 0 in value. Where II 


indicates the product of the terms following it, 
GM = ^X,.X 2 .X 3 .X„ = (nx)" 

=Aotilog rii2£i'. 

GM is computed for measurements in a 


Scanned by CamScanner 


















46 STATISTICS IN BIOLOGY AND PSYCHOLOGY 


logarithmic scale, frequency distributions with 
scores more concentrated in the low-value tail, 
and distributions of bacterial counts, reflex 
reaction times, and sensory responses like 
differential perception of sound frequencies of 
matched intensities and optical potentials at 
different illuminations. Some of its properties are 
as follows. 

(a) The product of all the scores (X) of a sample 
of size n equals the nth power of CM. 

Y\X=(GMf. 

(b) The arithmetic average of the logarithms 
of all the scores of a sample (size = n) equals the 


logarithm of their GM. 


log GM = 


I log Y 
n 


(c) Where all scores have positive values, y 
ordinarily either exceeds or equals GM. 

(d) Where there are k number of samples with 

respective sizes of n 2 . n k , the weighted GA/ 

of all the samples is computed from their 
respective GMs and sample sizes. 


log GM = 


t 


— (n. log GM. + 
"1 +"2 +.+'»* 1 * 1 

n 2 log GM 2 +.+ n k log GM k ). 


Example 3.3.1. 

Compute the geometric mean of the foUowfjag pH wIbm of a munher at cultures : 

8.0, 8.3, 7.1, 8.2, 7.6, 7.7. 7.9.8.0,7.4.8.2, 8.1.7.8,7.9,7.6c 

Solution : 


1 

2 

3 

4 

5 

6 

7 * 

8 

Total carried 
forward 


Tfcble 3.4. Treatment of pH data for GM. 


0.9030899 

0.919078 

0.8512583 

0.9138138 

0.8808135 

0.8864907 

0.897627 

0.9030899 

7.1552611 


GM = Antilog |S2S*j = r 13.4416059 1 


8.0 

8.3 

7.1 

8.2 

7.6 

7.7 
7.9 
8.0 


Total brought 


) log A 

forward 


7.1552611 

9 

7.4 

0.8692317 

10 

8.2 

0.9138138 

11 

8.4 

0.9242792 

12 

8.1 

0.908485 

13 

7.8 

0.8920946 

14 

7.9 

0.897627 

15 

7.6 

0.8808135 


13.4416059 


3.4 QUANTILES OR FRACTILES 

These statistics of location include percentiles 
quartiles and deciles. These partition values are 
used in working out quartile deviation, skewness 
and kurtosis (vide chapters 4 and 6). 



Scanned by CamScanner 























STATISTICS OF LOCATION 


47 


with Mdn (§ 3 . 6 ). Similarly, P 25 and P 75 are the 
scores below which lie respectively 25% and 75% 
of the total scores. 

Quartiles are those scores in a frequency 
distribution, below which lie specific numbers of 
quarters of the total frequency. Thus, Q v Q 2 , Q 3 
and < 2 4 are the scores below which lie respectively 
one-fourth, half, three-fourths and all of the total 
number of scores. Q 2 — P ^-Mdn\Q^—P 25 ; Q 3 
= ^75 > £?4 = ^100* 

Deciles are those scores in a frequency 
distribution, below which lie specific numbers of 
tenth parts of the total distribution. Thus. £>,, D 3 
and D I0 , the first, third and tenth deciles, are the 

scores below which lie the lowest th, ths and 


all of the scores respectively. is identical with 
Mdn , P 50 and Q Dj with P 10 , and D , 0 with P 10 q. 

Computation of quantiles 

Where P is the required percentile, c/j is the 
cumulative frequency of all the intervals below 
the true lower limit Xj of the class interval 
containing P^./^ is the observed frequency in that 
class interval, i is the interval size, n is the sample 
size, p is the proportion of total cases below P^, 
and pn is the number of cases to be counted off in 
reaching P p from the lowest score, 


Deciles, quartiles and median are likewise 
computed as the corresponding percentiles. 


Example 3.4.1. 

Compute the 25th and 75th percentiles of ihc frequency distribution of body weights (kg) of Table 2.11. 
Solution: 

The cumulative frequency distribution of Table 2.11 is reproduced in Tabic 3.5. 

(a) The size i of the interval-, is obtained by subtracting X m of any interval from that of the next higher one. 
Using the X.. scores of 65.5 and 62.5. 

1 = 65.5-615 = 3. 


(b) For computing P-> 5 : 

p = 0.25 ; n = SO; pn = 0.25 x 80 = 20. 


So, 20 scores are counted off starting from the lowest class interval, thus reaching into the interval 57-59 in 
which P 25 lies. The true lower limit (X,) and the frequency (fp of that interval and the cumulative frequency cf, 
upto its lower limit amount to 56.5,14 and 12, respectively. 


X, = 56.5 ; cfi= 12; f p = 14; pn = 20; 

pn—cf] rr r n 20-12 

/>„ or e, - x, + i x e-jf- = 56.5 1 3 x -jj- 


* = 3; 

= 58.2 kg. 


Table 3.5. Cumulative frequencies of body weights (kg). 


Class intervals 


True limits 

/ 

cf 



lower (Xy) 

upper (XJ 




51-53 

50.5 

53.5 

5 

5 


54-56 

53.5 

56.5 

7 

12 


57-59 

56.5 

59.5 

14 

26 


60-62 

59.5 

62.5 

28 

54 


63-65 

62.5 

65.5 

15 

69 

r 

66-68 

65.5 

68.5 

8 

77 


69-71 

68.5 

71.5 

3 

80 (n) 



Scanned by CamScanner 













48 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


(c) For computing P 7S : 


p = 0.75 ; pit = 0.75 x 80 = 60. 


In counting off 60 scores starting from the lowest interval, the interval 63-65 is reached in which lies p. 

X/ = 62.5 ; cff = 54 ; f p = 15; pn = 60; i = 3; 

P 15 or 0 3 = X, + ix BzSk - 62.5 + 3 x 6 °~ 54 = 63.7 kg. 


75 ' 


Graphical determination 

Percentiles, deciles and quartiles may be 

obtained graphically from cP ogives (Fig. 2.12). 

A line is drawn paralled to the X axis from that 

point on the Y axis which corresponds to the cP 

for the required fractile ; e.g., the cP amounts to 

10,20,25,50 and 75 for respectively £),, D 2 , P ?5 , 

P 50 and P 75 . From the point of intersection of this 

line with the ogive, an ordinate is dropped to the 

X axis. The point of intersection of this ordinate 

with the X axis gives the required fractile. 

/ 

3.5 PERCENTILE RANKS 

A percentile rank (PR) is the rank or graded 
position of a given score on a scale of 100 among 
all the scores of a sample. It is estimated from the 
percentage of scores lying below it. 

PR from cumulative frequencies 

Where the given score X belongs to a class 
interval having the size /, the frequency/ the 
true lower limit X, and the cumulative frequency 


eft upto X/, 


cfi+ - 7 — L 

P/?= lOOx-1- 


PR from ranked scores 

(a) When the given score has been assigned 
numerical rank R in a descending order c 
magnitude in a sample of size n , 


PR = 100 - 


100/?-50 


(b) Where the ranking is in ascending order, 

P p- 100/?-50 
n 

PR from cP ogive 

An ordinate is raised on the X axis of a. 
ogive at the given score X. A horizontal line 
drawn from the point of intersection of 
ordinate with the ogive. The point of inU_- 
of this line with the Y axis gives the PR of 1 
given score. 


Example 3.5.1. 

Hnd the PR of the score 64 of the data presented in Table 3.5 of Example 3.4. /. 


Solution 


The score 64 belongs lo Che interval 63-65 with the tme lower limi.rv. . 

frequency (f) ofl 5 ; the cumulative frequency (r/,) unto the J r f' 5l “ in,eml size < f) of 3 ^ 

y\jp upio tne x, of 62.5 amounts to 54 (Table 3.5) 

X-64; X t = 62.5 ; / = 3; / = 15 ; 


PR= 100 x 


c/ l+ ( I- X M 


'P *•' • c // = 54; ns 80; 

M + (64-62.5)15 


= 100 x 


3 


80 


= 76.9. 


Scanned by CamScanner 


















STATISTICS OF LOCATION 


49 


Example 3.5.2. 

Find the percentile ranks of students occupying the 3rd and 20th ranks in the descending order of merit in a 
Biology examination involving 80 students. 

Solution: 

(/) For the student with rank {R) of 3, 


PR = 100- = _ 100x3^50 

n 80 

(«) Similary, for the student with rank (R) of 20, 


96.9. 


PR = 100- 


100 x 20 - 50 
80 


75.6. 


3.6 MEDIAN 

Median {Mdn) is that score in a frequency 
distribution, above and below which lie equal 
numbers, i.c., 50%, of the scores or cases of the 
sample. It is a partition value or fractile (J 3.4) 
and is identical with D v P^ and Q 2 . Some of its 
properties arc given below. 

(a) The ordinate on the X axis at the Mdn 
bisects the area of a frequency distribution into 
two equal halves. 

(/;) In symmetric unimodal distributions, (i) 
Mdn coincides with the mean and the mode, and 
(i*0 the algebraic sum of deviations of the observed 
scores from the median amounts to 0. 

Mdn = X * M 0 ; I(X - Mdn) = 0. 

(c) In asymmetric distributions, (i) the median 
differs from the mean and the mode, the mean 
being located further than the median towards the 
longer tail of the distribution, and (ii) the algebraic 
sum of deviations of the scores from the median 
differs from 0 in value, indicating by its positive 
or negative sign respectively a longer positive or 
negative tail of the distribution. Thus for an 
asymmetric distribution with a longer positive tail, 
X > Mdn > M 0 , and 1(X - Mdn) > 0 ; but for a 
distribution with a longer negative tail. Mg > Mdn 

> X , and X{X-Mdn)<0. 

(d) As the median is less deflected than the 


mean by extreme deviations of a few scores, it is 
a more reliable and representative measure of 
central value than the mean for an asymmetric 
distribution. 

(e) Median can be computed even for 
frequency distributions with open class intervals 
or unequal class sizes, and also for ranked data 
like those of psychological and achievement tests. 

Median is used in working out qnean deviation, 
coefficient of mean deviation, coefficient of 
skewness and the median test. 

Graphical determination 

After drawing a cP ogive (Example 2.7.1, 
Fig. 2.12), a line is drawn parallel to its X axis 
from that point on the Y axis which corresponds 
to the cP of 50. From where this line meets the 
ogive, an ordinate is dropped to the X axis. The 
point of intersection of this ordinate with the X 
axis gives the Mdn. 

Computation from ungrouped data 

In an ungrouped set of data, median is the 
(n + l)/2th score, counted from either the lowest 
or the highest score of the sample. 

(a) If there is an odd number of scores in the 
sample, i.e., n is an odd number, Mdn coincides 
with that observed score which belongs to the 
(n + l)/2th individual. 


Scanned by CamScanner 









50 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


(/>) 11 ' there is an even number of scores in the 
sample, the ( n + l)/ 2 th score falls midway between 
two observed scores and is given by the average 
of those two scores (Example 3.6.1). 

(c) If the Mdn or (n + l)/ 2 th score falls within 
a set of identical scores in die data, all the observed 
identical scores of that set are assumed to occupy 
one unit interval extending from 0.5 below the 


score to 0.5 above the latter. Each score of th e 
is assumed to cover that fraction of this 


set 

unit 


interval as is given by the reciprocal of the numb e 
of scores in that set. Mdn is computed by addj f 
to the lower limit of this unit interval as — ^ 


many 0 f 


these fractions of that interval as the number of 
identical scores of the set covered in counting 0 ff 
0.50n scores, starting from the lowest score of th e 
data (Example 3.6.2). 


Example 3.6.1. 

Find the median for the following reflex knee jerk strengths (in degrees of arc) of a sample of athletes : 

19, 21, 22, 26, 28, 30, 31, 35, 35, 37. 

Solution : 

in « + l , 10 + 1 

n = 10 . Mdn = —th score = —-— or 5.5th score. 

Thus, the Mdn lies between the 5th and 6 th scores counted from either the lowest or the hipest score 
Counting from the lowest score of 19, the 5th and 6 th scores amount to 28 and 30 respectively. 

Mdn = 5.5th score = s ^e + 6th score 28 + 30 _ 

2 2 ~ 


Example 3.6.2. 

Find the median for the following wing length (mm) data of a sample of cockroaches : 
20, 22, 23, 24, 26, 26, 26, 28, 29, 29, 31. 


Solution : 


n • • Mdn - ^ th score = — ^-5- or 6th score. 

low^onfleXM of e ,h™ OW H lh ! ' a " d tenUdnin8 five abovc “• But in counting off five scores from th 
^ f,rS 'f 8e,S “ in ««*• However, th 

^ivmg a T “* °" C0Umin8 ° ff °" ly «<*•idenfica, -ores fo, 

Mdn. •. Mdn i m Z C ° U " ,ed SC °' e iS reached " 25 5 + ° » or 25.83. This, therefore, is th 


Compulation from grouped data 

For a continuous frequency distribution, Mdn = X t + i x ZSli 

grouped into class intervals, Mdn is computed as „ u , . . p 

‘ 50 or C^ 2 *^ rom cumulative frequencies . ere c h ls cumulative frequency of all class 

intervals below the true lower limit X, of the 


Scanned by CamScanner 








STATISTICS OF LOCATION 


51 


• terval containing the Mdn, f p is the observed 
feauency in that interval, i is the size of class 

* nervals, and 0.50n gives the proportion of the 
total frequency n to be counted off from one end 
of the distribution to reach the Mdn. 

Such computations may sometimes 
pose problems. ( a) If the Mdn falls between 


two intervals, the true limit between those 
two intervals, i.e., the true lower limit ot the 
higher of the two intervals, is taken as the Mdn 
(Example 3.6.4). (b) If the Mdn falls in a vacant 
interval containing no case, the midpoint X c of 
this interval is then taken as the Mdn (Example 
3.6.5). 


Example 3.6.3. 

Compute the median of the following frequency distribution of body weights (kg). 

Class intervals : 51-53 54-56 57-59 60^2 63-65 66-68 69-71 

Frequencies : 5 7 14 28 15 8 


Solution: 

(a) The data are arranged in Thbte 3.6. TIk tree tower hint (Xj) of each interval and the CUMttUt lvc ftwpUQI 
(</) upto it s X u arc complied. For miampte. for (be 4ii imoval 6042 from the lowest one, 

X/ « ViKlowcr score limit of the interval) ♦ (Ufv* score limit of the next lower interval)] 

B VM60 + 59) ■ 59.5. 

cf =/, +/ 2 +/ 3 +/, « 5 ♦ 7 ♦ 14 428 ■ 54. 


Thble 3.6. Cumulative frequcooev ot body »o*ht data (cf upto the respective X u scores) 


Class intervals 

x, 

/ • 

cf 

51-53 

50.5 

5 

5 

54-56 

53.5 

7 

12 

57-59 

56.5 

14 

26 

60-62 

59.5 

28 (fp) 

54 (median class)' 

63-65 

62.5 

15 

69 

66-68 

65.5 

8 

77 

69-71 

68.5 

3 

80 (n) 


(b) The size i of the class intervals is obtained by subtracting the X, of any interval from the X, of the next 
higher one. Thus, i = 59.5 - 56.5 = 3. 

(c) The number of scores, to be counted off from one end of the distribution to reach the Mdn, is given by 
0.50n. Thus, 0.50n = 0.50 x 80 = 40. 

(d) The counting off of 40 scores with effect from the lowest score leads into the interval 60-62 (median 
class) in which the Mdn lies. The true lower limit (X,) of this interval is 59.5, the interval has a frequency (f) of 
28 cases, and the cumulative frequency (cf t ) upto the (X,) of this interval amounts to 26. 

Xj = 59.5 ; cf t = 26; f p = 28. 

(e) The Mdn is then computed as follows : 


Mdn = X; + i x 


0.50n ~ efi 
fp 


= 59.5 + 3 x 


40-26 

28 


61.0 kg. 


Scanned by CamScanner 












52 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Example 3.6.4, 

Calculate the median 

Class intervals : 67- 76 
Frequencies : 8 


ii.»rihmlon of achievement test scores in a group of students. 
• ror "" ! M ' ,Wln * “ “ ^ ,07-1,6 117-126 


77-86 
13 18 


19 


15 


Solution : 

The data are arranged in Ttblc 3.7. c 

» The true lower limit« of each intern,! and the cumulative frequency (c/) up.o ,.s X. P ed. ft, 

example, for the 4th interval 97-106 from the lowest, 

v = wt(lowcr score limit of the interval) + (upper score limit of the next lower interval)] 

' = W(97+ 96) = 96.5 

cf =/ 1+ / 2+ / 3 +/4 = 8+13 + 18 + 1 9 = 58 . 

(b) The size i of the class intervals is obtained by subtracting the X, of any interval lrom the X, of the next 
higher one. Thus, i = 86.5 - 76.5 = 10. 



Table 3 7. Cumulative frequencies of achievement test scores, (c/upto the respective X u scores) 

Class intervals 


/ 

cf 

117-126 

116.5 

5 

78 (n) 

107-116 

106.5 

15 

73 

97-106 

96.5 

19 

58 

87-96 

86.5 

18 

39 

77-86 

76.5 

13 

21 

67-76 

66.5 

8 

8 


(c) The number of scores, to be counted off from one end of the distribution to reach the Mdn, is given by 
0.50n. Thus, 0.50/j = 0.50 x 78 = 39. 

(d) The counting off of 39 scores, starting from the lowest interval, leads exactly upto the X l of the interval 
97-106. The Mdn, therefore, falls between the intervals 87-96 and 97-106. So, the X t (96.5) of the higher of these 
two intervals, viz., 97-106, is taken as the median. Thus, Mdn = 96.5. 

[The same result is also obtained on applying the formula used in the last example. 

Mdn = X, + i x ° 5Q ”~ c// = 96.5 + 10 x = 96.5.] 


19 


Example 3.6.5. 

Calculate the median of the following frequency distribution of serum iron (//g dL -1 ) in 32 humans. 
Class intervals : 97-106 107-116 117-126 127-136 137-146 147-156 


Frequencies : 3 

Solution : 

The data are arranged in Table 3.8. 


8 


0 


11 


Scanned by CamScanner 










STATISTICS OP LOCATION 


53 


(fl)Thc true lower liniii <X t ) of cat h interval and ilic cumulative frequency (c/) upto its X arc computed. For 
example* for the 3rd interval 117-126 from the lowest one, 

Xj a !/i((|tAver xcorc limit of the interval) \ topper score limit of the next lower interval)] 

■ '/id 17 + JJ6)- 116.5, 

cf ■/) +/2‘f/3*3-f5‘f8* J6. 

(b) 'I lie size / of the class intervals is obtained by subtracting the X, of any interval from the X, of the next 
higher interval; thus, / * 116,5 - 106,5 * 10, 


(able 3,8, Cumulative frequencies of serum iron data, (cf upto the respective X u scores) 


C‘lass intervals 


/ 


97-106 

96.5 

3 

3 

107-116 

106.5 

5 

8 

117-126' 

116.5 

8 

16 

127-136 

126.5 

0 

16 (median class) 

137-146 

136.5 

11 

27 

147-156 

146.5 

5 

32 (n) 


(c) Hie number of scores, to be counted off from one end of the distribution to reach the Mdn, is given by 
0.50 n. Thus, 0,50/1 = 0.50 x 32 » 16. 


(d) The counting off of 16 scores starting from the lowest interval brings us exactly upto the X/ of the interval 
127-136, in which the Mdn should fall; but this is a vacant inter.al with no case. So, the midpoint (X c ) of that 
interval is taken as the Mdn. For this interval, 

X c = '/i(X u + X,) = '/2(136.5+ 126.5)= 131.5. Mdn = 131.5/Jg. 


3.7 MODE 

The mode (M 0 ) is that score of the variable 
which belongs to the largest number of individuals 
in a sample. It is, therefore, the most frequent score 
in the sample and coincides with that point on the 
X axis of a frequency distribution which 
corresponds to the peak of the latter. Some of its 
properties arc as follows. 

(a) A distribution may be unimodal, bimodal 
or multimodal, according to its one, two or more 
peaks and as many M 0 values. 

(b) There is no mode if all scores of the sample 
arc either identical or have the same frequency. 

(c) In a perfectly symmetric unimodal 
distribution, M 0 , X and Mdn ate identical. 

(</) Mode, unlike median and mean, does not 


change even if some extreme scores occur in only 
one tail of the frequency distribution. 

(e) In an asymmetric unimodal distribution, 
Mdn lies between M 0 and X while M (J lies on 
that side of the Mdn which leads to the shorter 
tail of the distribution. Thus, for a distribution with 
a longer positive (high-value) tail, X > Mdn > 
M 0 ; but if the negative (low-value) tail is longer 
than the positive one, M Q > Mdn > X . 

(J) The amount and algebraic sign of the 
deviation of the mean from the mode indicate 
respectively the degree and direction of 
asymmetry of the distribution. 

Computations of mode 

(a) In a simple series of scores or in the 
ungrouped data of a simple frequency distribution. 


Scanned by CamScanner 







54 STATISTICS IN BIOLOGY AND PSYCHOLOGY 


is the most frequent score. 

(b) In the grouped data-of a quantitative 
frequency distribution with class intervals of equal 
size, the mode occurs in the interval having the 
highest frequency of scores. Where X { is the true 
lower limit of the modal class having the highest 
frequency of scores and carrying the mode,/ m is 
the frequency of that modal class, / m _, and f m+] 
are the frequencies of the class intervals that 
respectively precede and follow the modal class 
and i is the size of each class interval. 

' ^2~ fm ~ fm+\ ' 

d x + d 2 = 2 f m ; 


M 0 can be so computed even in incompl et , 
frequency distributions with open class interval 

(c) In a grouped distribution with cl a $, 
intervals of unequal sizes (lengths), mode j, 
worked out approximately front the mean and tht 
median of the sample : \t 0 = 3 Mdn - 2X . ^ 
example, the mode of a frequency distribution 
having a mean of 73.12 and a median of 73.0, „ 
given by : 

M 0 * ZMdn - IX 

= 3 x73-2 x 73.12 = 72.76. 


Example 3.7.1. 

Compute the mode o( the following frequency d is t ributi on of both weights (kg). 

Class intervals : 51-53 54-56 57-59 60-62 63-65 66-68 69-71 

Frequencies : 5 7 14 28 15 g 3 

Solution : 


(a) The data arc arranged in Table 3.9. The 
there. For example, for the interval 60-62, 


true lower limit (Xj) of each interval is computed and entero 


X, = Vi[(lower score limit of the interval) + (upper 
= W(60 + 59) = 59.5. 


score limit of the next lower interval)] 



(c) The class interval 60-62, having the highest frequency of 28, is identified as the modal class The 
of pie modal class, the class preceding it and the class following it, are identified as f f ™ h r 

and noted. Thus,/ m = 28 = 14 ;/ m+1 = 15. m ' n ~ 1 ^ 


Scanned by CamScanner 









STATISTICS OF LOCATION 


55 


(d) The true lower limit (X,) of the modal class is 
^1 ~fm ~fm-\ = 28 — 14 = 14 ; 


is noted from Table 3.9. Thus, X { = 59.5. 
4; =4-/^1 =28-15 = 13. 



GLOSSARY 

central tendencies . statistics of location such as mean, geometric mean and median which are specific scores 
near t e mi die of the frequency distribution, around which all the scores of the sample are distributed. 

central values, measures of: same as central tendencies. 

deciles . scores, below each of which there is a given number of one-tenths of all the scores of the sample. 

fractiles : scores, below each of which lies a given fraction of all the scores, i.e., of the frequency distribution, of 
a sample. 

mean : a measure of central value, worked out as the arithmetic average of the scores of a sample. 

mean, geometric : a measure of central value, worked out as the nth root of the product of all the n number of 
scores of the sample. 

mean, weighted : a mean of a number of samples, tai.cn together, worked out from the sum of the products of the 
means of those samples and their respective sample sizes. 

median : a measure of central value, below which lies 50^ of all the scores, i.e., the lower half of the frequency 
distribution, in a sample. 

mode : a measure of central value, being the score which has the highest frequency of occurrence in the sample 
and thus corresponds to the peak of the distribution. 

percentiles : fractiles, below each of which lies a given percentage of the total number of scores in the sample. 

percentile rank : a rank given to a score, on a scale of 100, among all the scores of a sample. 

quantiles : same as fractiles. 

quartiles: fractiles, below each of which lies a given number of quarters of the frequency distribution, i.e., of the 
total number of scores, in a sample. 

sum of squares : the sum of the squared differences between the scores of a sample and the sample mean. 


Scanned by CamScanner 






iSP ERS1° N 

STATISTICS 9Uipass *03*«• 

sa^ST. 

,,„n of its scores around tn mod e of a measures oj 




4. 


nerston of its scores arou- ~ or m0 de o* - 
ml values such as ll “ "’ e “ ' i0 „ ser ve as u' ea5 '"° 
sample. S.atisucs of dvspe dta ttve t»» 

ofvanabm.yofsco re sandg«J s 

about their frequency dismbut 


su res of dispersion . 

Relative t» eas QUt not directly from tf, 
These are wor , e but from some absol ule 
raw scores of ^“‘ ^on' and the correspond*, 
measures of ** va , ues . Each relative meat,, 

„ ntsPERSION measures of cent ^ ^ absolut e measured!, 

4.1 MEASURES Oh MS 12) isd erived from duelike mean, is expressed* 

These are such JJ respe ct S D and a central ^ ^ . g consequelMlyf[tt 

as describe die property ot a P variable aperc entage of ond i n g raw scores d 

- variability encores of a expr ^ the unt. of dre^P ^ ^ 

in that sample. They sco res of a t he variable. (0 . variabilities of two 

numerically mean or very suitable for companng^ 't^.^ 

sample from a given centr scatter sets of scores given tn ditt tral 

sr xrsir-s - — 

belong to the following two classes. ?„ lhe same u „it but diverging vet| 

Absolute measures of dispersion : widely fro m each other in their central valuesd 

These are worked out directly from the raw Morcover> they can a i s0 compare the precis® 
scores of a variable and are expressed tn the same ^ of data . Relative measures inc 

unit as that of those scores. They are widely used coefficient of var i at ion and coefficient of quad 
in many statistical work and inclu e rang’ ’ deviation, 
standard deviation, variance, quartile deviation 
and central moments. But (0 they cannot be used 4 2 RANGE 

in comparing the dispersions orR ange is the interval from the highest to ^ 
scores of more than C the standard lowest score of a measurement variable i® 

deviations (SD "of Ly heights’ (cm) and body ^ all the scores of a sample M1« ^ 
weights (kg) in a sample cannot be used for the ran S e - In computaUons using the ran e • 
comparing their variabilities as these SDs are value * s talcen as follows, 
expressed in cm and kg respectively. Moreover, Range = [(highest score) - (lowest score)] + l 

00 these absolute measures are not suitable for It is the simplest absolute measure of disp* 

comparing the variabilities of two sets of scores, sion, and bears the same unit as that of the ra" 
expressed in the same unit but having widely scores. If the low^t , fating 

divergent central values like the mean ; for suga, scores Z ra 

example, SD is not suitable for comparing the in a sample the and 114 m S dL \ P 55 ^ 
variabilities of femur lengths of giraffes and rats, the ramie of in '. W ' 8e ls(ll4_60) + . ’ „ 0) + 1 . 


O' 

K * 

& .■ 
K 

V** 

ab° u 

of 

of? 

gf e ‘ 

sah 

e^ c 
Bv 1 
a 5 
di: 


^ -wr jy | 11^ mg 

viuiituiuutn m tvtun. lengths ol giratlcs and rats, 
both in cm units, because the mean and 
consequently the SD of the variable in giraffes far 


56 


4. 


S' 

a 

( 

i 


b ‘“ avures are 60 and 114 mg dL " 1 respe^’ 
in a sample, the range is (114 - 60) + 1 . or 55 nl "i 

ihe ra nge of IQ SCQres amounts to (105 - 80) + . 

mnn ’ m a sam P le with the minimum and ® 

i scores of respectively 80 and 105. 


Scanned by CamScanner 





STATISTICS OF DISPERSION 


57 


Range is used in working out quantitative 
frequency distributions and in statistical quality 
ontrol. But it has many limitations: (i) The range 
cannot be found out for incomplete frequency 
distributions with open class intervals because the 
lowest and/or the highest scores of the sample are 
not available there, (ii) It indicates only two 
extreme scores of the sample with no information 
about the magnitudes and frequencies of 
intermediate scores, (jut) It gives no indication 
about the form of the distribution of scores — 
whether symmetric or skewed, whether unimodal 
or multimodal, whether mesokurtic, leptokuitic 
or platykurtic. (/V) It is very unstable and varies 
greatly with the sample size — the larger the 
sample, the greater is the chance of including more 
extreme scores and so, the wider is the range, (v) 
Even when all other scores are close to each other, 
a single extreme score I ease the noge 

disproportionately. 

4.3 MEAN DEVIATION 

As the algebraic sum of the deviations of all 

scores of a sample from the mean, viz.,I(X- X X 
always amounts to 0 , a measure of absolute 
dispersion cannot be straightway worked out as 
the arithmetic average of those deviations. So, 
mean deviation (MD) is sometimes computed as 
an absolute measure, using the sum of absolute 
deviations of the scores from the mean, 
disregarding the algebraic signs of those 
deviations. 

l|(X-X)| 

MD = —- 

n 

where the bars indicate the ignoring of the 
algebraic signs. MD bears the same unit as that of 
die raw scores. But (i) it measures the variations 
of scores in magnitude only, not in direction; so, 

I") it is unsuitable for many further statistical 

work. 

4-4 STANDARD deviation 

Standard deviation ( SD ) is the positive square 
r °ot of the mean of squared deviations of all the 

8 


scores from their mean. It is, in other words, the 
positive square root of variance or of the second 
moment about the mean (vide § 4.5 and 4.6). It is 
a very useful absolute measure of dispersion and 
is in the same unit as the original scores. The 
standard deviations of a sample and of a 
population are denoted by 5 and G respectively. 
For a sample. 

X(X-X) 2 _ 

n V /* 

where X represents the individual scores, X is 
the sample mean. (X - X ) or .v is the deviation of 

a score from X, and n is the total frequency or 
sample size ; the sum of the squared deviations, 

viz.. X(X - X )\ used in this computation, is called 
the sum of squares (SS). For a population with 
size N and mean p. 

Some properties of SD arc as follows. 

(a) Because SD takes into consideration all the 
scores of a sample, it changes with the change of 
even a single score. 

( b) Addition or subtraction of a constant 
number to or from each individual score leaves 
the SD unaltered, but the multiplication or division 
of each score by a constant number produces an 
identical change in the SD. 

(c) If all scores have an identical value in a 
sample, the SD amounts to 0. 

(d) The higher the SD, the wider is the 
dispersion of scores around the mean. 

(e) In different samples from the same 
population, SDs differ far less than do the other 
absolute measures of dispersion. 

(/) If the scores of a variable Y are the linear 
functions of the scores of another variable X, the 
plotting of the Y scores against the X scores of the 
respective individuals produces a straight line with 


Scanned by CamScanner 







58 


BIOLOGY 


AND PSYCHE- 


STATISTICS IN 


ti as in vertical intercept and * “ Th “" 

for the X and X scores of each individual. 

Y=o + bX. 

In such case, if » r and , x are 
respective variables and the bars on the 
of b indicate that the latter is taken as positive 
irrespective of its actual algebraic sign. 


*y= \b\*f 

(, g) In a small sample ( n < 30). extreme scores 
at the two ends of the freiquency distribution may 
escape inclusion due to their less frequent 
occurrence in the population. Because SD depends 
on r.ll the scores, the exclusion of many extreme 
scores from else sample tends to lower the SD of a 
small sample much below the population O The 
SD of a small sample thus suffers from an 
undesirable downward bias. 


( h) The composite SD of a number of group* 
can bn computed from their individual group sires 
(n f ), the group SD* ( s f ). and the deviations of 

respective group means ( X ,) from the grind meat 
(X ) of all the groups. 


s 


/ 


I V, } +EV%~jO* 


I* 



1 Unbiased SDfbr small samples : 

The downward bins of SD of a small sample, 
due to the reason cited in the paragraph (g) above 
may be compensated by using its degrees of 
freedom (n - I) instead of n in its computation. 
Such an SD » called an unbiased SD. (For degrees 
of freedom, see § 5.5.) 


<n ..JjHLiHd; 

1 n{n- 1) 

(iff) »- 

1 «-1 


Computation from simple ungruuprd wnc-v 

1. For large samples: 

SD may be computed by any of the following 
formulae from the simple series of ungrouped 
scores (A) if the total frequency (n) of the sample 
is large (> 30). 


Because even a large sample is smaller than 
the population, the SD of even such a sample, 
when computed with n as the denominator, suffers 
from some downward bias, though small enough 
to be neglected often. Hence, it is preferable to 
compute the unbiased SD for even large samples 
uswg (n - 1) instead of n as the denominator ’ 


Example 4.4.1. 

Compute the SD of the following body weights (kg) of 16 men : 
55.60. 62. 58.57,61,59,60,61,62,67,48,54,65,63, 52. 

Solution : 

(/) The data are arranged in Table 4.1 and the 


mean X is computed using the sum of the scores. 

kg- 


U 944; "~ 16; •** X = T” = ITT = 59 -0k*. 


Scanned by CamScanner 



















STATISTICS OF DISPERSION 


59 


00 The ctevi alion of each score from the mean, viz., (X— X ), is worked out, recorded with its algebraic sign, 
and squared (Table 4.1), 


(iii) fne sum of all the {X — X )~ values is used in working out the unbiased SD (s). 


s = 


I(X-X) 2 


1 n-1 




90 kg. 


Table 4.1. Table for computing mean and SD of body weights. 


Serial Body weight 
No. (X) 

X-X 

CX-X) 2 

Serial Body weight 
No. (X) 

X-X 

(X-X) 2 

1 

2 

55 

60 

-4 
+ 1 

16 

1 

Total brought 
forward 

533 


40 

3 

62 

+ 3 

9 

10 

62 

4-3 

9 

4 

58 

-1 

1 

11 

67 

4-8 

64 

5 

57 

-2 

4 

12 

48 

-11 

121 

6 

61 

♦ 2 

4 

13 

54 

-5 

25 

7 

59 

0 

0 

14 

65 

4-6 

36 

8 

60 

4- 1 

1 

IS 

63 

4-4 

16 

9 

6 ! 

♦ 2 

4 

16 

52 

-7 

49 

Total carried 
forward 

533 


40 

Total 

944 

OX) 


360 

I(X- X) 2 


Alternative solution : 

(I) Alter entering the data m Table4.2,etchXtC0ft iftMjamdMd the Mftnred Korea arc alsoCOtnsd in the 
table. Both X and X 2 scores arc totalled to give LX and LX , mpeviivcl) 

(ii) The unbiased SD (r) is worked out using LX and LV : 




w£X 2 -(lX) 2 

n(u -1) 


£ 


x 56056-944 2 
16(16-1) 


= 4.90 kg. 


Serial 

No. 

~r 

2 

3 

4 

5 

6 
7 
S 
9 

Total carried 
forward 


Table 4.2. Table for computing SD of body weights using raw scores. 


Body weight 
(X) 

55 

60 

62 

58 
57 
61 

59 

60 
61 

533 


X 2 

Serial 

No. 

Body weight 

(X, 

X 2 

3025 

Total brought 

533 

31605 

3600 

3844 

forward 

10 

62 

3844 

3364 

11 

67 

4489 

3249 

12 

48 

2304 

3721 

13 

54 

2916 

3481 

14 

65 

4225 

3600 

15 

63 

3969 

3721 

16 

52 

2704 

51605 

Total 

944 

(IX) 

56056 

(IX 2 ) 


Scanned by CamScanner 























60 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Example 4.4.2. 

Compute the SD of the following memory test scores of 20 High School students : 

9, 10, 12. 15, 9, 11, 16. 10. 13, 9, 12, 10, 14, 13, 15, 16, 13, 10, 12, 14. 

Solution : 

CO After entering the data in "Bible 4.3, the mean X is computed using the sum of the scores. 

IX = 243 ; ft = 20; X = M = 22. = 12 . 15 . 



(UO Alternatively 


s= nMlzSMl 

V n(n-l) 


120 x 3057 - 243 2 
I 20(20 -1) =2.346. 


Example 4.4.3. 

Find I he unbiased SD of (he following winglengH, scores ( ram ) „ f , samo|e . ' „ 

35. 36. 26, 28. 44. 30, 22, 33. 27, 25. 40, 44. 35, 31, 29.K ' 

Solution : 

(a) After tabulating the scores in Table 4 4 enrh v i* „ 

in the table. Both X and X 2 scores are totalled to give IX and ^ SqUMed scores (* 2 ) ** entered 

unbiased SD. 8 ' “ ' ,nd lX res P“="vely, which are used in working out r“ 


Scanned by CamScanner 















STATISTICS OF DISPERSION 


61 


s 


I 


nl X 2 -(ZX) 2 

n(n — 1) 


/16 x 17331 -517 2 

v 16(16-1) 


= 6.46 mm. 


Table 4.4. Table for computing SD of winglengths from raw scores. 


Serial 

Winglengths 

X 2 

Serial 

Winglengths 

X 2 

No. 

(X) 


No. 

(X) 


1 

35 

1225 

Total brought 

281 

9119 

2 

36 

12% 

forward 

3 

26 

676 

10 

25 

625 

4 

28 

784 

11 

40 

1600 

5 

44 

1936 

12 

44 

1936 

6 

30 

900 

13 

35 

1225 

7 

22 

484 

14 

31 

961 

8 

33 

1089 

15 

29 

841 

9 

27 

729 

16 

32 

1024 

Total carried 
forward 

281 

9119 

Total 

517 

(LX) 

17331 

(LY-) 


(b) Alternatively : 

(i) After t ab u l a t i ng t he X scores iolfcble 4.5, the mean X is computed from their sum. 

LX = 517; n » 16; :.X ■ = 32.3 mm. 

n 16 

(n) '1 lie dev lain mi ui etc li score from the mean, viz., X- X , is worked out, recorded w ith its algebrait 
and squared (Table 4.5.). 

(Hi) The sum of the squared de\ uiioos, It.V - X J 2 . is used in waking out the unbiased SD. 

k(X-Xr ( 625.44 , „ 

‘ i -i m6Mmm 


Table 4.5. Table for computing the mean and SD of winglengths. 


Serial No. 

X 

X - X 

(X-X) 2 

Serial No. 

X 

X-X 

(X-X) 2 

1 

35 

+ 2.7 

7.29 

Total brought 




2 

36 

+ 3.7 

13.69 

forward * 

281 


356.01 

3 

26 

-6.3 

39.69 

10 

25 

-7.3 

53.29 

4 

28 

-4.3 

18.49 

11 • 

40 

+ 7.7 

59.29 

5 * 

44 

+ 11.7 

136.89 

12 

44 

+ 11.7 

136.89 

6 

r 30 

-2.3 

5.29 

13 

35 

+ 2.7 

7.29 

7 

22 

-10.3 

106.09 

14 

31 

-1.3 

1.69 

8 

33 

+ 0.7 

0.49 

15 

29 

-3.3 

10.89 

9 

27 

-5.3 

28.09 

16 

32 

-0.3 

0.09 

Total carried 

281 


356.01 

Total 

517 


625.44 

forward 





(LX) 


L(X-X) 2 


Scanned by CamScanner 























62 


STATISTICS IN B10L0QY ANP PSYCHOLOGY 



Computation from grouped data of fre¬ 
quency distributions 

This method applies to data arranged into 
frequency distributions with either equal or 
unequal class intervals. Because all scores ot a 
class interval are assumed to be identical with its 
midpoint (X 0 ), 

XX = lfX c ; X(X - X) 2 = IftX c - X ) 2 ; 
where / and X c are respectively tlie frequencies 
and midpoints of the intervals. 

(i) The X of each interval is first worked out. 

X c = Vi[(upper limit) + (lower limit)] 

(ii) The frequency (/) of each interval is multi¬ 
plied by its X c and these products (fX c ) for all the 


intervals arc totalled to give X ,fX c . 

{iii) The mean (X ) is then computed. 

j _ XAc 

n 

(iv) The deviation of each X c from X is vvorl^ 
out and squared to give (X c - X ) 2 . 

(v) Each (X f - X ) 2 is multiplied by the/# 
that interval and these products are totalled to pj v 

W, - x ) 2 . 

(vi) The unbiased SD is then computed. 


J= i f(Xc-xr 


n -1 


Example 4.4.4. 



Compute the mean and SD of body heights (cm) in the following distribution : 

Class intervals : 156-160 161-165 166-170 171-175 176-180 

Frequencies : 4 14 25 11 6 

Solution: 


(0 After entering the data in Table 4.6, the midpoint X c of each interval is worked out from the upper i 
lower limits of that interval. For example, for the interval 161-165, 

165 + 161 


X c = 


= 163 cm. 


00 The frequency (J) of each interval is multiplied by its X c and these products (/X c ) are totalled for all tin 
intervals to give lfX c which is used in computing the mean. 

V _lfX c _ 10085 . 

X - -= - tt: = 168.1 cm. 

n oU 


(iii) The difference between each X t . and the mean is worked out and squared to get the (X - X ) 2 of tK 
relevant interval. 


Table 4.6. Table for computation of mean and SD of heights. 


Class intervals 


/ 

Ac 

x c -x 


_^ 

f{X c -X?^ 

156-160 

158 

4 

632 

- 10.1 

102.01 

408.04 

161-165 

163 

14 

2282 

-5.1 

26.01 

364.14 

166-170 

168 

25 ‘ 

4200 

-0.1 

0.01 

0.25 

171-175 

173 

11 

1903 

+ 4.9 

24.01 

264.11 

176-180 

178 

6 

1068 

+ 9.9 

98.01 

588.06_^ 

Total 


60 (n) 

10085 



1624.60_^ 


Scanned by CamScanner 





















STATISTICS OF DISPERSION 


63 


(iv) Each (X c X multiplied by the frequency /of that interval and such products of all the intervals arc 
totalled to give lf(X c - X ) 2 which is used in working out the unbiased SD. 


|l/(X c -X) 2 _ 1 1624.60 , 

V - \~ 60 -l~ = 5 ' 


25 cm. 


Example 4.4.5. 

Compute the mean and unbiased SD of the following distribution of Ca- + concentration scores (in mg per kg 
of extracellular fluid) in a sample of lobsters. 

Class intervals : 11.6-13.0 13.1-14.5 14.6-16.0 16.1-17.5 17.6-19.0 

Frequencies : 7 13 20 14 6 


Solution: 

(i) After entering the data in Table 4.7, the miflpnfai 3^ of cteh interval is worked out from the limits of that 
interval. For example, for the interval 14,6-16.0. 

X c = Vi((lowcr limit) ♦ (upper limit)] = V4 (14.6 ♦ 16.0) = 15.3. 

(«) fluency (0 of each interval is multiplied by its .V and the sum of such products of all the intervals, 
viz., L/X c , is used in working out the mean 


X 


I/X, 916.5 
n " 60 


■ 15.3 mg. 


(m) i lie tit1111 cn«. > nd teanMb voted om tad aquand to get (X r — X ) ' c4 the teles .mi 

interval. 


(i\>) Each(X c - x )' is nitiiiipiiodbyte ftw| ne Bcy /ofteiiatenadai>d«ochiiroductsot aQtheiotervali are 
totalled to give Lf(X c - X )* which is used in working cn die unbiased SD. 


l z/(X f -X) 2 _ < 177.75 
V n- 1 " V 60-1 


1.74 mg. 


Table 4.7. Table for computing mean and SD of Ca 2+ concentrations. 


Class intervals 


/ 

f*c 

x c -x 

(X c -xf 

f(X c -Xf 

11.6-13.0 

12.3 

7 

86.1 

-3 

9 

63 

13.1-14.5 

13.8 

13 

179.4 

-1.5 

2.25 

29.25 

14.6-16.0 

15.3 

20 

306.0 

0 

0 

0 

16.1-17.5 

16.8 

14 

235.2 

+ 1.5 

2.25 

31.50 

17.6-19.0 

18.3 

6 

109.8 

+ 3 

9 

54 

Total 


60 (n) 

916.5 



177.75 


Computation from simple frequency table 

^Vhere n number of scores are arranged 
individually — without any grouping — in a 
simple frequency table, the unbiased SD may be 


computed by using the frequency /of each score 
and the mean X of the scores. 

I/(X-X) 2 

n -1 


Scanned by CamScanner 


















64 


STATISTICS IN BIOLOGY 


aND PSYCHOLOGY 


Example 4.4.6. . table and compute Itx.r SD : 

Arrange the following body heights (cm) in a simp e 
180, 165, 170, 162, 176, 167, 180, 162, 165, 1 

Solution : . fitqoeOC ies 0 i» 4 , 8 , the mean X k 

(0-^fter entering the individual scores (X) and their . ^pcctive frequencies 

computed from the sum of the products (fX) of the scores -n 

j _ IA _ 2( ^ 2 = 169.3 cm. 



Total 


12 (/i) 


2032 


442.68 


(//) The difference between each score and X ii coapatal and tqpNrad to get (X - )^whlch Is multiplied 
by the frequency/of that score. Such products for all the scons *re Mailed lo give IfiX - X J 2 « M h I • OMdta 
working out the unbiased SD. 


Il/(.v-.vr _ / 442.1 

V V 12- 


442.68 

It 


(6.34 cm. 


4.5 VARIANCE 

Variance or mean square (j 2 or MS) is the 
arithmetic mean of the squared deviations of 
individual scores from the mean. It is a good 
absolute measure of dispersion, identical with the 
squared SD as well as the second central moment 
(page 61) ; this is why SD is known as the mot- 
mean-square about the mean. Variance is in 
squared units like cm 2 and kg 2 . It is used in the 
analysis of variances (anova rests). 


The sample variance is generally computed as 
follows for ungrouped scores : 


Z(X-X ) 2 

/!- 1 


or 


, hLX 2 -(EX ) 2 
/»(«-!) 


where TAX - X ) 2 is the sum of squared differ¬ 
ences of scores from the sample mean and is 
known as the sum of squares, and ZX 2 is the sum 
of the squared scores. 

For a frequency distribution grouped into 
regular class intervals, variance is computed as 
the squared SD by the following formula : 

■J>_ S/(X g -3n 2 

n -1 

where / and X c are respectively the frequencies 
and midpomts of the intervals. (For comparison 
with SD, see pages 52 and 56). 

The parametric or population variance has the 

symbol o z . 


Scanned by CamScanner 


















STATISTICS OF DISPERSION 


65 


Example 4.5.1. 


Compute the variance and SO of t| 1P r u . 

3.5. 4.8, 4.3, 3 . 4 , 5 j 4 ~ ° 0Win 8 h °uscfly winglengths (mm) : 

’ 3 - 8 * 4.5. 3.6, 5.0, 3.4, 4.4, 5.3, 3.7, 4.0, 3.3. 

Solution : 

(a) After entering the scores (X) in T hi 

entered in the table. Both X anH v2 ab c 4 ‘ 9, each x score is sauar«i an ,i ,h„ j ,,/> 

,. , .. “ * and X z scores are tntoiu i . squared and the squared scores (X 1 ) are also 

working out the variance anH »k . . arc totalled to give EX and y y 2 , 

) and the unbiased SD. & an<1 ' res pcctively, which are used in 

A —s2 


J. »EXMSX) 2 16 x 281.23 - 66 I 2 

1606^0-0.433 mm 2 ; ^ . 0 . 65g 


mm. 



(0 The X scores are tabulated in Table 4.10 and the mean X is computed from their sum IX. 

IX = 66.3 ; n= 16; X = — = ^ = 4.14 mm. 

n 16 


00 The difference of each score from the mean, viz., X - X .is worked out, recorded in the table and squared 
t0 &‘ ve toe (X - X ) 2 scores. 

(Hi) The sum of these squared differences, viz., I(X - X ) 2 , is used in working out the variance. 

..2 _ I(X - X) 2 _ 6.4996 = 0433 mm 2 . s =y/7 = V0433 = 0.658 mm. 

- —\ 16-1 


9 


Scanned by CamScanner 















66 STATISTICS IN BIOLOGY AND PSYCHOLOGY 

Thblc 4 10. Thblc for computing variance of housefly winglengths from their sum of squares. 


1 


Serial No. 


1 

2 

3 

4 

5 

6 
7 
S 
9 

Total carried 
forward 


X 

X-X 

(X-X) 2 

Serial No. 

X 

X-X 


3.5 

4.8 

-0.64 
+ 0.66 

0.4096 

0.4356 

Total brought 
forward 

37.2 


2.8804 

4.3 

+ 0.16 

0.0256 

10 

5.0 

+ 0.86 

0.7396 

3.4 

-0.74 

0.5476 

11 

3.4 

-0.74 

0.5476 

5.1 

+ 0.96 

0.9216 

12 

4.4 

+ 0.26 

0.0676 

4.2 

+ 0.06 

0.0036 

13 

5.3 

+ 1.16 

1.3456 

3.8 

-0.34 

0.1156 

14 

3.7 

-0.44 

0.1936 

4.5 

+ 0.36 

0.1296 

15 

4.0 

-0.14 

0.0196 

3.6 

-0.54 

0.2916 

16 

3.3 

-0.84 

0.7056 




Total 

66.3 


6.4996 ^ 

37.2 


2.8804 


<ZX) 

ia-xf 


Example 4.5.2. 

Compute the variance and SD of the body height (cm) distribution of Example 4.4.4. 

Solution: 

(i) After entering the data in Table 4.11, X c of each interval is worked out from score limits of that inter** 
For example, for the interval 166-170. 

X c = V6[(upper limit) + Gower limit)] = — - = 168 cm. 

(u) The frequency / of each interval is multiplied by its X c and these products (fX c ) are uxzlltz ::: £ 
intervals to give X fX which is used in computing the mean. 


¥ Wc 10085 
* - ~ - “ 60 “ - 1681 


cm. 


Table 4.11. Table for computation of variance from grouped data of body heights. 


Class interval 

*c 

/ 

Ac 

x c -x 

< x c -xf 

JVC c -X? 

156-160 

158 

4 

632 

-10.1 

102.01 

408.04 

161-165 

163 

14 

2282 

-5.1 

26.01 

364.14 

166-170 

168 

25 

4200 

-0.1 

0.01 

0.25 

171-175 

173 

11 

1903 

+ 4.9 

24.01 

264.11 

176-180 

178 

6 

1068 

+ 9.9 

98.01 

588.06 

Total 


60 (n) 

10085 



1624.60 


(M0 The difference between each X c and the mean is calculated and squared to give (Y - y r d ^ 
interval. 

(/V) Each (X c - X j 2 is multiplied by the/of that interval and these products for all the intervals are 
to give Lf(X c - X ) 2 which is used in working out the variance. 

^ If(X L -X) 2 1624.60 _ , rr , - 

5 =--* —(fi _ [ = 27.54 cm- ; s = ^r 2 = V2T54 = 5.25 cm. 


Scanned by CamScanner 





















STATISTICS OF DISPERSION 


67 


1.6 CENTRAL MOMENTS 

A central moment about the mean is the 
rithmetic average of the deviations of individual 
icorcs from ihc mean, each such deviation raised 
a given power. Central moments are absolute 
icasurcs of dispersion. 

The first central moment (m j) about the mean 
computed from the sum of deviations, each 
used to the power 1. It amounts to 0 for both 
symmetric or asymmetric distributions. 

m . m MzZl m0 . 

1 n 

The second central moment (w 2 ) about the 
lean is the arithmetic average of the squared 
eviations of scores from the mean, and is 
ientical with variance (.v 2 ). It is expressed in 
quored units like cm 2 and kg 1 . Us square root is 
lie SD. 


**-*£. *. 


rtu 


n 


The third and fourth central moments (n»j and 
) are computed as follows, and ore expressed 
i units raised to the powers 3 and 4 respectively. 


ni\ - 


X(X-X) 3 


m 4 = 


X(X-X) 4 


Tlic third and higher central moments of the 
Id order amount to 0 in symmetric distributions, 
t possess positive or negative values according 
the distribution is positively or negatne > 
ewed. So, m 3 and higher central moments of 

; odd order serve as measures ***"“*£ 
ymmetry. The central moments of the even 

Lr, viz m 2 and m 4 , ate used in measuring the 

akedness or kurtosis of a distribuuon. 

7 QUARTILE DEVIATION 

The interquartile range of a |_ r ^^ nC> 

stribulion extends from the firs. ° 

,,) to the third quartile (g 3 ° r p 75>- ™ us ' 

15' . . ^(\ol of the scores oi a 

tntains the middle 50/c or 


distribution. Quartile deviation (Q) or the semi- 
interquartile range constitutes half of this middle 
50% range of scores. It is an absolute measure of 
dispersion, expressed in the same unit as the 

scores. 

U 2 2 * 

Some of its properties are as follows. 

(a) Because the lowest 25% and the highest 
25% of the scores lie beyond the interquartile 
range from which Q is computed, the latter is 
independent of the scores at the two tails ot the 
distribution. Thus, Q is unaffected by extreme 
scores at cither tail of the distribution and can be 
computed even for incomplete distributions with 
open class intervals. 

(b) Q in nor affected by scores other than (? 3 
nd q v Thus, if gives no idea about the distri* 
frr ri—■«»« variability of other scores either within 
or beyond the interquartile range. This is a serious 
deficiency of Q. 

(c) Kurtosis or peakedness of a frequency 
distribution is proportional to Q. The nuller the 
Q, the greater is the concentration of scores at the 
middle of the distribution and the more is the 
distribution leptokurtic with a high peak and a 
narrow body. A large Q shows a long interquartile 
range owing to a wider dispersal of scores of the 
middle order, making the distribution platykurtic 
with a low peak and a broad body. 

(d) In unimodal and bilaterally symmetric 
distributions like the normal distribution, Q 2 or 
Mdn coincides with the mean and the mode at the 
centre of the distribution. In such cases, Q 3 and 
(2, are equidistant from Q 2 . Q then contains 
exaedy 25% of the total scores on either side of 
the Mdn and equals 0.6745rr. In other words, the 
range X ± Q includes 50% of the scores in 
symmetric unimodal distribution. 

(e) In asymmetric distributions, one tajlofthe 


Scanned by CamScanner 













68 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Q 2 or So, the midpoint of the interquartile percentages of the total scores. 

gets displaced towards the skewed tail. On Q is used in measuring skewness and kiirt 0 
two sides of Mdn, Q now covers unequal of a distribution. 

Example 4.7. J. 

Compute the quartile donation of the following achievement test scores in a group of students. 


Class intervals 
Frequencies 

Solution: 


81*90 

7 


91*100 

12 


101*110 

19 


111*120 

24 


121*1.10 

14 


131*140 

4 


<0 After entering the dai. inThble 4 12. .he in* lo*et Unit«,) foe each Intern! and the eumulntive r m ,„. 
(cf) upto its X* art computed. For example, for the 4th interval 111.120, UCftc f 

V ' ' W nr^no s 0 '“* * ,UPP " ,,roi, of ,hc "» *"•* interval)] j 

c f */| +/2 + /j +/ 4 ■ 7 ♦ 12 ♦ 19 ♦ 24 • 61 
(ii) The size i of the intervals is obtained bv uitarfetm* ik*. lo, , 

next higher ooa Using u* low soon '' Ml. 

f - ill - ioi ■ ia 

Table 4.12. Cumulative frequencies of irhinim JU , , . . 

-Ti^ interi,,. -~ C ‘ ^f^vmcnr tcri tcore,. (e/ upru the rc.pcetivc X .core,) 

/ cf 


131-140 
121-130 
111-120 
101-110 
91-100 

81-90 _ 

in,cna1 ' PlS ' thC Pr0P0ni0n p of ^ to be counted of starting from the lowed 

n P = 0.25; „ = 80 ; = 0.25 x 80 = 20 

X t - 100,5 ; /p=l9 . ^ . 


■■■ or 0,=V* ■ = 100.5 . ,0 x 20 - 19 


19 


(iv) For computing Q z or P ?5 , 

0 „ P = 0.75; a = 80 ; A pn = 0.75 x 80 * 60 

On counting off «0 .cores sroning front rhe ,„ wea mtcrval , ^ „/ 

x ‘~ ,10 ' 5; 4=20; cf t ~ 3g 


= 101 . 0 . 


120 is reached in which P n be 


Scanned by CamScanner 







STATISTICS OF DISPERSION 


69 


**• '75 or C ?3 = */ + '* = 110.5+ 10 x — , . 38 =119.7. 

Jp 24 


(v) The quart!le deviation is then computed. 


Q_ Q}-Q) _ 119 , 7 - 101,0 


9.35. 


Example 4.7.2. 

Compute the quartile deviation of the following distribution of cockroach wmglcngth scores (mm). 

Class intervals : 20*22 23*25 26-28 29-31 32-34 

Frequencies : 6 14 20 12 8 

Solution : 

(/) After entering the data in Table 4.13. the lower Imut (X,) for each interval and the cumulative frequency 
(cf) upto its X u are worked out. Fornxanpk, for (he 3rd interval 26*28. 

X, • W((lowcr score limit of the interval) ♦ (upper soort limit of the next lower interval)] 

= V *26 + 25) - 25.5. 

cf »/,+/ 2 +/j-6+ 14 + 20-40. 

(II )The interval site (/) to obtaine d by whuadfci (he lower acorc limit of any interval from (hat of the next 
higher interval. Using the lower limns ol To-28 and 29- 31. 

I-29 - 26 - 3. 

(Hi) For computing (), or '25. the proportion p of too! caves to be counted off, starting from the lowest 
interval, amounts to 15. 


pm 0.25 ; n = 60; :.pn = 0.25 x 60= 15. 

On counting off 15 scores starting from the lowest interval, the interval 23-25 is reached, in which '25 lies. 
The true lower limit (X t ), the cumulativ e frequency (cf t ) upto that Xf, and the frequency (f p ) in that interval are 
noted from Table 4.13 and used in working out Q x or '25- 

X t = 22.5 ; cf, = 6; f p = 14 ; / = 3 ; 

Q ] orP 2 S =X, + ix = 22.5 + 3 x = 2443 n™- 


Table 4.13. Cumulative frequencies of winglength scores, (cf upto the respective X u scores) 



Class intervals 

x , 

/ 

cf 


20-22 

19.5 

6 

6 


23-25 

22.5 

14 

20 


26-28 

25.5 

20 

40 


29-31 

28.5 

12 

52 

•—-_ 

32-34 

31.5 

8 

60 (n) 


Scanned by CamScanner 


















70 


STATISTICS IN 


BIOLOGY AND PSYCHOLOGY 


(iv) For computing <2 3 of *75* MrnlIaf1y ' n * 60 = 45. 

,,0.75: 

from the lowest interval the interval 29-Ji 


On counting off 45 scores starting 


X,= 28.5; <// s4 0; /^ s,2; *” 3 ‘ 

pn-c// * i x = 29.75mm. 

/. CjOr/ , 7J *X / +/x—^- l *28.5 + 3 , 2 

(v) (2, and (2 3 are then used in computing the quartile deviation <(?> 

0 _ Q}~Ql _ 29/75-24.43 s 2.66 mm. 


4.8 COEFFICIENT Of VARIATION 

Coefficient of variation (CV> is a relative 
measure of dispersion, obtained by expressing the 
SO as a percentage of the mean. 

CVmjx 100. 

Expressed as a percentage, O'is independent 
of any particular unit, and suitable in comparing 
variabilities of two variables measured ra different 
units Iliu . it can be used ompartng 

variabilities of physicochemical variables 


expressed in ratio scales (with true 0), such ns 
scrum cholesterol scores (mg) and arterial BP (mm 
Hg). (ti) It can also compare variabilities of two 
sets of scores in the same unit but with widely 
divergent means, e g., femur lengths (cm) of 
giraffes and mice. 

CV i% not suitable (0 for psychological and 
educational data in interval scales (with no true 0 
point) because no ratio can be computed with SI) 
and mean in the latter scale, (If) nor where the 
mean is close to 0 because CV may approach 
infinity there. 


Example 4.8. L 

Find the relative variability of the following scrum cholesterol scores ( X) and systolic arterial pressure scores 
(r) in a sample of humans. 


Semni cholesterol : X = 164.6 rag dL" 1 ; s x = 18.86 mg. 
Systolic pressure : Y =128.6 mm Hg; iy= 13.74 mm. 

) 

Solution : 


(0 For serum cholesterol. CV X = 44- x 100 = x 100 = 11.46. 
(10 For systolic pressure. CV y = 44- x 100 = x 100= 10.68. 


m 


CVy 


11.46 

10.68 


1.07. 


So. variability of cholesterol scores is 1.07 times that of pressure scores. 



Scanned by CamScanner 









STATISTICS OF DISPERSION 


71 


4.9 COEFFICIENT OF QUARTILE 
DEVIATION 

It is also a relative measure of dispersion. It is 
obtained by expressing the quartile deviation (0 
as a percentage of the median (Mdn or Q 2 ). 


CQD = ~rrr x 10 °- 

Man 

Being independent of the units of raw scores, 
it can be used to compare variabilities of two sets 
of scores in different units. 


Example 4.9.1. 

Compare the variabilities of body weight and body height scores of a sample, using the following data. 

; 0i = 58.2 kg; 0 2 = 61.O kg; 0 3 = 63.7kg. 

; 0i = 153.4 cm; 0 2 = 166.2 cm; 0 3 = 176.2 cm. 


Body weight 
Body height 


Solution : 

(1) For body weight 


(if) For body height: 


c ^e li o L = 6 i2 -5 M=17Jkg 


05-01 176.2-153.4 

Qh ~ — L -;-* 11.4 cm. 


CQD h 


“i^r xl 00 *i ^ xl00 = 686 


As the CQD is higher in case of height, variability of height scores is much higher than that of weight in the 
sample. 


4.10 COEFFICIENT OF DISPERSION 

This is the ratio of variance and mean of a 
sample. 



X 


Thus, it is a relative measure of dispersion, but 
bears the same unit as the scores. 

Because CD bears the unit of the raw scores, 
it is not suitable in comparing the variabilities of 
two sets of scores expressed in two different units. 
However, it can be used for comparing variabilities 
of two sets of scores which have widely divergent 
rcteans, but are given in the same unit CD amounts 
to 1.00 in a distribution where events occur at 


random and independent of each other. But CD is 
less than 1 in repulsed distributions where the 
occurrence of one event reduces the probability 
of occurrence of a second similar event and 
consequently, the centre and the two tails of the 
distribution carry respectively higher and lower 
frequencies of scores than in a distribution having 
random occurrences of events. On the contrary, 
CD exceeds 1 in clumped distributions where the 
occurrence of one event increases the probability 
of occurrence of a second similar event and 
consequently, the centre and the tails of the 
distribution carry respectively lower and higher 
frequencies than in a random distribution of 
events. 


Scanned by CamScanner 














72 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


GLOSSARY 

central moments: absolute measures of dispersion worked out as the arithmetic averages of deviations of scores 
from their mean, each such deviation having been raised to a given power. 

coefficient of dispersion: a relative measure of dispersion given by the ratio between the variance and the mean 
of a sample. 

coefficient of quartile deviation : a relative measure of dispersion which is the quartile deviation expressed as 
a percentage of the median. 

coefficient of variation : a relative measure of dispersion obtained by expressing the standard deviation as a 
percentage of the mean. 

mean deviation : an absolute measure of dispersion given by the arithmetic average of the absolute differences 
o t e individual scores of a sample from their mean, disregarding the algebraic signs of the differences. 

measures of dispersion, absolute : statistics of dispersion such as standard deviation, variance and quartile 
cviation, w c arc computed directly from raw scores of a sample and bear the same units as those of 


CaSUr ^viation ^'hkll' arc'ohi'* ! . St ? Ustics ° f dupcrs,on such “ thc coefficients of variation and quartile 
quartile deviaUon^an absoluie measure of dispenion g,ven by half of .he difference between the .him and .he 
n, " eC U U * ^ “« - 'owes, score .0 .he highes. 

statistics of dispersion : statistics like standard deviation variant an/t .r . 

and express in numerical units the dispersion of scares of a vnriahi * ,1 ° f VanaUon ’ which measure 

and the median in a sample. ° f 3 V3nablc around cen,raJ values like the mean 

sum of squares = me sum of squared differ of me scorns of a variab.e fan a centra, value like .he mean 

variance : an absolute measure of dispersion given by the mean of souarpH ri ff 

from their mean. ^ ^ differences of the scores of a sample 





Scanned by CamScanner 






5. SAMPLING STATISTICS 


jhe statistic of a variable varies from sample 

sample drawn from the same population, and 
forms a frequency distribution around the 
corresponding population parameter. Sampling 
statistics such as standard errors serve as the 
measures of deviation of the statistics of a sample 
from the respective population parameters. A 
sampling statistic is also used in working out a 
confidence interval in which the corresponding 
parameter has a specified probability of falling. 
Sampling statistics also help in drawing 
generalized inferences for the entire population 
from the findings in one or more samples. 

5.1 SAMPLING ERRORS 

Each sample consists of only a limited number 
of individuals or cases drawn from a vast 
population. The cases and their scores vary from 
sample to sample ; even if the same individuals 
constitute successive samples, their scores arc not 
identical in successive measurements due to their 
temporal variations. Consequently, any statistic of 
a sample frequently differs from similar statistics 
of other samples from the same population. 
Although the statistics of a few samples may 
coincide with the population parameter, the 
statistics of many more samples differ from the 
parameter. Sampling error of a statistic is the 
difference between a statistic and the 
corresponding parameter. Its cause lies in the 
chance factors associated with the random 
sampling of only a limited number of individuals, 
in exclusion of others, from the vast population. 
Sampling error ( e ) may be negative or positive 
according as the parameter exceeds the statistic 
or falls short of the latter; thus, the sampling error 

of a sample mean would be : e = X - F° r 
example, if the finite population of all the 
participants in an Olympiad has a mean vital 

10 73 


capacity (p) of 5.8 litres and a random sample of 
the participants has a mean (X ) of 6.4 litres, the 
sampling error of X is given by : 

e = X - p = 6.4 - 5.8 = + 0.6 litre. 

Similarly, if the mean IQ on the Stanford Binet 
scale amounts to 120 for the entire Unite 
population of IIT-JEE candidates, and to 116 for 
a small sample of those candidates, the sampling 
error of the mean of this sample is given by : 

e=X -p = 116- 120 = -4. 

Sampling errors may be estimated from the 
variations of the relevant statistic in samples drawn 
from the given population. 

5.2 SAMPLING DISTRIBUTIONS 

If many large samples of identical size n are 
drawn by random sampling from the same 
population and a particular statistic (say, the mean) 
is worked out from the scores of each sample, the 
computed values of the statistic arc distributed in 
a frequency distribution, called an experimental 
sampling distribution of that statistic, around the 
corresponding parameter. A similar frequency 
distribution of a statistic of different samples from 
a population of a known nature can also be 
constructed theoretically using the laws of 
probability ; this gives a theoretical sampling 
distribution of that statistic. Any sampling 
distribution is due to the varying sampling errors 
of the relevant statistic of different samples from 
the corresponding population parameter. 

The standard deviation of a sampling 
distribution of a statistic is a measure of the 
dispersion of the latter around the corresponding 
population parameter, and is called the standard 
error ( SE ) of that statistic. For example, the means 
of samples drawn from a population form a 


Scanned by CamScanner 









74 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


sampling distribution of means and the standard 
deviation of this distribution is called the SE oj 
the mean (sy)- 

Sampling distributions can also be framed for 
the differences between the statistics of different 
samples drawn from the same or different 
populations. Standard deviation of such a 
sampling distribution is a measure of dispersion 
of differences between the statistics of each pair 
of samples around the difference between the 
corresponding population parameters; it is known 
as the SE of the difference between the relevant 
statistics. For example, differences between the 
means of every pair of samples from the same or 
two separate populations form a sampling 
distribution of differences between the means, and 
the standard deviation of this distribution is the 
SE of the difference between the means (sy^y 7 )• 

5.3 STANDARD ERRORS 

Standard error (SE) of a statistic is a measure 
of its sampling error which is the deviation of that 
statistic from the corresponding parameter. It is 
the standard deviation of the sampling distribution 
of the relevant statistic - the standard deviation 
of the theoretical sampling distribution is the 
parametric or true SE (e.g., (Ty ) while the 
standard deviation of an experimental sampling 
distribution is an estimate of the true SE and is 
computed from the observed sample data. 

SEis computed for many sample statistics such 
as mean, standard deviation, proportion and 
correlation coefficient, it is used (i) in measuring 
the variability of a statistic between different 
samples of a population, (//) in testing the 
significance of the relevant statistic, and (iii) in 
computing a confidence interval within which the 
population parameter has a specified probability 
of falling, (tv) The standard error of the statistics 
of two samples, drawn from the same or different 
populations, may be used in computing the SE oj 
the difference between such statistics. 


The SE of the difference between th 
statistics of a particular type is the C Sa,t 'P'c 
deviation of the sampling distribution 8 '*^' 1 
differences around the difference betweenTh* 
parameters of the populations from which the 
samples have been drawn. It is a measure of th e 
variability of such differences and is used i n 
testing experimental hypotheses by finding the 
significance of difference between the statistics 
of two samples exposed to different levels of 
experimental treatments. 

Standard error of the mean 

Means of samples from a population form a 
sampling distribution of means around the 
parametric mean pof that population; the standard 
deviation of this distribution is the SE ot the 
means. For a theoretical sampling distribution of 
means (§ 5.2), the standard deviation of the 
distribution is the true or population SE (cr x ) 0 f 
the means ; the standard deviation of an 
experimental sampling distribution (§ . .2) is an 
estimate of Cy and is represented by sy . 

s v is a measure of the deviation of sample 
means from the population mean, and an index of 
the sampling error of means. It is used (/) in finding 
whether or not X is a dependable estimate of ft, 
(ii) in working out a confidence interval in which 
p has a specified probability of falling, and (iii) 
in computing the SE of the difference between the 
means of different samples for testing 
experimental hypotheses (chapter 7). 

sy can be conveniently estimated even by 
using a single sample — its value is inversely 
proportional to the square root of the size of the 
sample used. 

(a) For a sample of size n, drawn from a ,finite 
population of size N by simple random sampling 
without replacement : 

5 /tf-n 

w * = 77V77^T 1 


Scanned by CamScanner 






sampling STATIS-ncs 


75 


1 



sam ple is drawn by stratified random 
™P lm 8 from a population with the means of its 
erent strata differing negligibly, sy is 
computed by the same formulae as used in case 
o simple random sampling. 

(e) If the sample is drawn by stratified random 
sampling from a population having substantial 
differences between the means of its different 
strata. 


(b) For a sam P* e d rawn from an infinite 
opulation (N = ~) by simple random sampling 

Without replacement: 




.JEi 

V N ' 




, where s 


_ Ix(X-X ) 2 

"n-1 ’ 


but 


X(X-X) 2 


where s is the unbiased SD of the entire sample, 
A 7 is the total sample size and s u is the SD of the 
sample strata. 

Where X,, X 2 , etc., are the raw scores of 
different strata, X t , X,, etc., are the respective 
stratum means, n { , n 2 , etc., are the respective 

stratum sizes, and X is the grand mean, sj, is 


(h) Sy ~ I ' I ’ w ^ ere s 

dn — 1 

(c) In case the sample has been draw n by worked out as follows for this computation : 

. . «,(X,-Xy +.+ MX.-X) 1 

3 .r - * Ul 


simple random sampling with replacement 
irrespective of whether the population is finite or 
infinite, 

-X) 2 

1 


(i) sj = where s 
yin 

5 

(iO sy = i-, where s 

V/i-1 


_ |l(X- 
V n- 

_ j x(x-x7 


i if (XX ,) 2 (XX ,) 2 ,(XX t ) 2 l 

* + ”1T + . + —J 


-t 


IX, + xx, +.+ xx 


N 


■1- 


166-170 

25 


171-175 

11 


176-180 

6 


Example 5.3.1. 

Compute the SE of the mean using the following frequency distribution of body heights (cm). 

Class intervals : 156-160 161-165 

Frequencies : 4 14 

Solution: 

* TWa c, Th? midooint (X ) of each interval is computed using the 

Tbe frequency distribution is entered in Table 5.1. The 

two score limits of that interval. For example, for the interv 

_ 170 +166 _ l68 
2 


Scanned by CamScanner 




























76 


STATISTICS IN BIOLOGY AND 


PSYCHOLOCO 


jr = Z£L = 10085 


60 


168.1 cm. 


*= c-JQ- fi^Teo 

“ n_1 V~6orr 51 



5.25 


cm. 


k ~M =057Scnx 


Table 5.1. Frequency disaibotion of body heights for computation of S£. 



156-160 

161-165 

166-170 

171-175 

176-180 


158 

163 

168 

173 

178 


4 

14 

25 

11 

6 


632 

2282 

4200 

1903 

1068 


- 10.1 
-5.1 
- 0.1 
+ 4.9 
+ 9.9 


102,01 

26.01 

0.01 

24.01 

98.01 


40S.04 

364.14 

0.25 

264.11 

588.06 


Total 


60 (n) 


10085 


1624.60 


Example 5.3.2. 

Compute the SE of the mean hemoglobin concentration (g dl - ') of the following data in a stratified sample 
from a population divided on the basis of sex. 

Men : 14.2, 14.6, 14.8. 14.0. 13.7, 14.8, 14.6, 13.8. 14.0. 14.9, 15.0. 

Women : 12.4, 12.8, 12.6, 13.1, 12.5, 119. 13.4, 14.2. 12.9, 12.2. 

Solution : 

The data are arranged in Table 5.1 The stratum sizes in the sample are as follows : 

Men : n, = 11. Women : 14 « 10. 

(a) The stratum means X| and X 2 and the grand mean X are computed. 


- IX, 

x i= « 

' n, 


158.4 ... y- _ 1*1 = _ 

_ = 14.4g, *»- ^ 10 1 K 


- _ n,X, + n,X 2 _ ilxl4 ; 4 + }° Xl ^- = 13.7 g. 


flj + Tb 


11 + 10 


• • f of each stratum from X are computed, squared and totalled. These sums of squared 

» Dcvnuo^of^ofcachs ^ of ^ enllK sample 


deviations are 


-> I(X,-X)- + I(X 2 X) 2.22 + 2.98 2 

^ =-n, + n 2 —1 (11 + 10)-1 UZ ° r> 

( ) The squared deviation of each stratum mean firom X is then used in computing the variance s], of strata. 


2 „,(X,-X) z + /i2(X 2 -Xr _ 11(14.4-13.7)~ + 10(12.9 —13.7) 2 _ 

s *> ~' «, + «, 11 + 10 


= 0.56 g 2 . 



Scanned by CamScanner 




















SAMPLING STATISTICS 


77 


Alternately ■ 

. 1 r^^SMUlitWiT I \ 158.4= . 129'] f 158.4 +-129 "f 

Z = -n[—* «, J L N J -lij-fl——S-J =0 ' 56 *-' 

(d) The SE (sj ) of the mean is then computed using s 2 and s* . 

. _ [s* + 2 /0.26 + 0.56 . 

^'V-r^'V-rmr = 0, *w*- 


Table 5.2. Hemoglobin data for computing SE of a stratified sample. 


— Men 

Women 

*. 


(X.-x,) 1 


Xj- X 2 

(X 2 -X 2 ) 2 


14.2 

-0.2 

0.04 

12.4 

-0.5 

0.25 


14.6 

+ 0.2 

0.04 

12.8 

-0.1 

0.01 


14.8 

+ 0.4 

0.16 

12.6 

-0.3 

0.09 


14.0 

-0.4 

0.16 

13.1 

+ 0.2 

0.04 


13.7 

-0.7 

0.49 

12.5 

-0.4 

0.16 


14.8 

+ 0.4 

0.16 

12.9 

0.0 

0.00 


14.6 

+ 0.2 

0.04 

13.4 

+ 0.5 

0.25 


13.8 

-0.6 

0.36 

14.2 

+ 1.3 

1.69 


14.0 

-0.4 

0.16 

119 

0.0 

0.00 


14.9 

+ 0.5 

0.25 

112 

-0.7 

0.49 


15.0 

+ 0.6 

0.36 





Z 158.4 


2.22 

129.0 


2.98 



Example 5.3.3. 

Compute the SE of the mean, variance and the unbiased SD of the following distribution of housefly winglength 
scores (mm x Kri 1 ). 

Class intervals : 34-37 38-41 42-45 46-49 50-53 

Frequencies 4 8 15 7 6 

Solution : 

After entering the frequency distribution in Table 5.3, the midpoint X c of each interval is worked out using the 
**° limits of that interval. For example, for the interval 42-45, 

X e - Vi[(upper score limit) + (lower score limit)] = Vi(45 + 42) = 43.5. 


Scanned by CamScanner 



























r 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


78 




Total 


40 (») 


1752.0 



- I/X- 1752.0 .. 0 ... 

X = - f - = ——— = 43.8 x 10 1 mm. 

n 40 


...I/tX.-Xg.WMO x|(H 

n -1 40-1 


mm. 


_ )l/(X c -X) 2 _ /876J0 =474xl0 -i 
n^l V 40-1 


= _i_ = 4 74 = 0.749 x 10-' mm. 
x Vn V40 


mm. 


Standard error of proportions 

Where the population is divided into two 
classes with respect to a variable, and p and q are 
the proportions of cases of the respective classes 
in a sample of size n , the SE of the proportion p is 
worked out as follows. 

po^p) _ El 

■“r - - V«* 

Standard error of difference between means 
If pairs of samples are drawn repeatedly, the 
two samples of each pair coming from two 
different populations, and the difference between 
the sample means of every pair is worked out, 
these differences (X, - X 2 ) between the sample 
means would form a sampling distribution around 
the difference (p x - p 2 ) between the respective 
population means. For samples from the same 
population instead of two different ones, the 


sampling distribution of the difference (X, - X 2 ) 
between their means would have a mean of 0, 
because p x equals in that case. 

The standard deviation of any such sampling 
distribution of (Xj - X 2 ) is the SE of the difference 
between the sample means, . It is a measure 
of the deviations of differences between sample 
means (X, - X 2 ) from a mean difference (p D ) 
which equals either (ji x - pj or 0, according as 
the samples have come from different populations 
or from an identical one. ^xj-x, usec ^ * n test ' n 8 
the significance of difference between two sample 
means in an experiment, and is computed from 
the standard errors (s^, and sy 2 ) of the respectne 
sample means and the respective sample sizes, 

%-Ti = + 



Scanned by CamScanner 


























SAMPLING STATISTICS 


79 


Example 5.3.4. 

The heights of 60 men and 50 women had unbiased SD» of 16,7 and 40,6 cm rrqw tiv«*ly < 'onipui** llu* SI ol 
,he difference between mean heights of men and women. 


Solution 


(a) For men : n t — 60 ; = 36.7 cm ; /. ,vy ( * a • 4.74 cm. 

( b ) For women : n 2 = 50 ; a 2 = 40.6 cm ; jjy a -jL* ■ -v— ■ 3 74 Cfn 

v w j v50 

(c) sj r y, = ylt%) 2 +(Sx 2 ) : = V(4.74 ) 2 +(5.74 ) 2 » 7.44 cm. 


Example 5.3.5. 

The unbiased standard deviations of the annual milk yields (in L x 10 a )of49 two-year-old Thurpokhna cows 
and 64 two-year-old Jersey OOV ind (Obo 1 93 and 5 lt dlff< , 

between the mean milk yields of the two breeds. 

Solution : 

(a) For Tharpakhna cows: n, = 49 ; s, ■ 3.93 ; s ^ ■ -4l- a 3 ; , ? 3 = 0.561 . 

' V rt i v49 

(*) For Jersey cows : n 2 = 64 ; s 2 = 5.58 ; a x . ■ -! a V s a 0.698. 

V n 2 v64 

(c) Using the respective S£ scores of the two means. 

= Vo.56l 2 +0.698 2 = 0.896. 


Standard error of difference between 
proportions 


This is the standard deviation of the sampling 
distribution of differences between the proportions 
of cases of a given class in two samples. This 
s P\-p 2 * s a measure of the deviation of such a 
difference between the respective population 
Proportions. It is used in testing the significance 
of a difference between two samples with respect 
to the proportions of a given type of cases. 


(u) For samples from two different populations: 

Where pj andp, are the proportions of a given 
c ass in two samples, s p and s p ^ are their respective 
^ts, and q x and q 2 equal (1 - p,) and (1 - p 2 ) 


respectively. 



(b) For samples from the same population : 
The common proportion p and q for both samples 

are first computed for use in working out s n 

p \ P 2 


_ W|P, + IJ2P2 . 
n, +n 2 


q=l-p; 


- Ifl + El 

V». % • 


1 


Scanned by CamScanner 












STATISTICS IN BIOLOGY AND 


80 

£ramp/c 5.3.6. 

Proportions of diabetics were found to be n T 7 • 


PSYCHOLOGY 

““ OTof ,he P—»ain, 



Solution ; 

Where P| andp 2 are the proportions 


between the sexe^ ” * ***** of 


‘ 2 |, 


v ^n. 


respectively. 

(a) for men : 


of diabetics, and and th osc 


those of non-diabetics i 


,nmen and w, 


°n*n. 


n, = 144 ; 


(b) for women 


P,-0.32; I _ aa2 _ 0l « ; 

s p - L _ fo.32xO.68 
1 \ 'h V 144 — 0.039; 

",= 12*; * = 0.22; * = 1 —* = 1 -0.22 a 0.78; 

s. = ./S^_ /o 22x0 .78 „ 

2 i 'h V 121 ~ 0.038. 


(c) using the values of ^ and . 


5p « = Vo.039 2 + 0.038 2 = 0.054. 


5.4 STANDARD SCORES 

Raw scores are often transformed into standard 
scores for gaining a standard meaning, a common 
reference value and a comparability. Such 
transformations may be linear or nonlinear. 

Linear transformations may change the zero 
point of the scale and the unit of measurement of 
the scores. These alter the mean and SD of the 
raw scores, but the differences between the 
transformed scores correspond closely to those 
between the respective raw scores in relative 
magnitude. So, the original shape, skewness and 
kurtosis of the distribution of raw scores remain 
unaltered in that of the transformed scores. Where 

X s is the transformed score, X s and s s are 
respectively the mean and SD of the transformed 
scores, and X and s respectively those of the raw 
scores, any raw score ( X) may be linearly 


transformed by putting the desired values to X t 
and s g in the following equation : 

X,= X, +(X-X)±. 

The simplest standard score is the linearly 
transformed z score, standard deviate or relative 
deviate. It is a measure of deviation of a raw score 
from the mean in terms of the standard deviation. 

On putting 0 a s X s , and 1 as s Jt in the above- 
mentioned equation, the transformed score (X s ) 
is the z score. Thus 


Thus, the z score expresses the deviation of a 
given raw score from the mean as so many times 
the SD. Irrespective of the original unit of raw 


Scanned by CamScanner 


















SAMPLING STATISTICS 


81 


- scores are expressed in standard deviation 
iC °^ a units). Evidently, the z score for the mean 
> f the distribution amounts to 0. It is positive 
for any score higher than the mean, and negative 
for any score lower than the mean. Thus, a raw 
score which is higher than the mean by an amount 
equal to 1 .65 times the SD . has the z score of 
+1 65 o ; a raw score that is lower than the mean 
by 1.96 times the SD, has the z score of -1.96cr. 


Being derived by linear transformation, the 
frequency distribution of z scores of a sample has 
ihc same shape and form as the original raw scores 
- only the unit is changed to a units and the zero 
point of the scale has also been altered. Evidently, 
the mean of the z score distribution am o unts to 0 
and is identical with the z score for ft white die 
SD of the distribution amounts to 1. So. : scores 
arc eminently suitable for comparing the 
of the same or different distribution!*) expressed 
in different units or with widely diffatai means 
and SDs. 


Just like the raw scores, the sample mean may 
also be similarly transformed into the : score. 


scores and consequently have the same form of 
sampling distribution as those raw scores. 

The z score is also computed as the standard 
score to express the deviation of the difference 

(,Y| - -YO between two sample means from the 
difference - ft,) betw een the parametric means 
of the two populations, from which the samples 
have been drawn. Where is the estimated 

SE of such differences between sample means. 

Where both the samples come from the same 
population. 



The z wore for the difference between two 
tamplc means finds application in testing the 
significance of such a difference observed in an 
experiment. However, because the unit of z scores 
i% \a which u relatively large, the computed z 
vcore often has a value with decimal. 


using the SE of the mean (v v ). 


The z score so computed expresses the 
deviation of the sample mean front the population 
mean in terms of the SE. Evidently, the z score 

for ft, viz., z — ft)Ja^, amounts to 0, around 
which the z scores of the sample means form a 
sampling distribution identical in shape and form 
with the sampling distribution of the 
corresponding sample means — it may be recalled 
that z scores are linear transformations of raw 


SanJtnrar transformations change not only the 
mean and SD. but also the shape, skewness and 
bmotis of the original raw score distribution. 
Nonlinear transformations such as logarithmic, 
reciprocal, square-root and arc-sine transfor¬ 
mations are often tried for converting a non- 
normal distribution of raw scores into a normal 
distribution of transformed scores; in psychology, 
nonlinear transformations like percentile ranks, 7 
scores, C* scores, stnmnes and mental age scores 
are frequently used. (See the chapters on Analysis 
of variances and Psychological test construction 
for details). 


Example 5.4.1. 

The means and SDs of winglengths in a sample of houseflies and in one of mosquitoes are as follow s: 

Houseflies ; Mean (X,) = 3.67 mm ; SD(j,) = 0.181 mm. 

Mosquitoes : Mean(X,) = 2.12 mm; SD(j,) = 0.124 mm. 


Scanned by CamScanner 






$2 


STATISTICS IN BIOLOGY AND PSM.HOLOGY 



_, 1Mrf UK jeu.uons of winglen*te of a «h» •»<««>' * V t * 323 mm) and * s ' vei ' "»*«»> •> 

mm) (h^thc topecure nKm ' 


23 


Solaris ' 


X, - X, 3.23-167 
Housefly: 2 , = = o.lSl 

X,-% _ 123-2.12 
xito: j, s= —*r— = 


Mosquito 


0.124 


= - 2.43 cr. 

= + O.S9flr. 


So. the wingkatgth score of the given housefly is lower than the mean by - 2.43 cr units white that of the giv en 
mosquito is higher than its mean by +0.S9<r. 


5.5 DEGREES OF FREEDOM 

The degrees of freedom ( df) of a statistic 
amount to that number of scores of a variable in a 
sample, which can be changed freely in magnitude 
and sign without causing any alteration in the 
values of all such statistics that have to be used in 
its computation as the estimates of the respective 
parameters. 

In computing a statistic, one or more of the 
already computed statistics may be used as 
estimates of the respective population parameters. 
Serving as an estimate of a parameter, each such 
pre-computed statistic used in the computation 
must remain unchanged during the operation; this 
causes die loss of freedom of any one of the scores 
of the sample to undergo any change, because tree 
changes of all other scores of the sample would 
determine or fix the only remaining score it the 
nre-computed statistic has to remain unaltered. As 
the total number of scores constitutes the sample 
size n the df \s lowered from n by 1 for keeping 
each precomputed statistic fixed or unchanged. 
Hence, the df of a statistic is often given by the 
sample size n less the number m of the pre¬ 


computed statistics used in its computation » s 
estimates of parameters : df= n - m. 

For example, the computation of SDo t a 
sample needs the use of the sample mean A ns 
an estimate of the population mean // ; 
consequently l !'•>' the *// of (it “ ')* u * * 
computation of the pooled ©t S»of two sample*, 
having sues n, and n,. involves the similar use of 

wo sample means X, and X, ns the estimates of 
the respective parametric means of the populations 
from which the samples have been drawn. So. the 

dfof s amounts to (n, + n 2 - 2). 

If the computation of a statistic uses two or 
more pre-computed statistics as the estimates of 
parameters, its dft quals the sum of the df values 
of all those statistics. For example, the SE of the 
difference between two means, viz., *y r x 2 . ' s 
computed using the SDs (jj and sf) of the two 
samples ; because these SDs have degrees of 
freedom given by (n ( — 1) and — 1) respectively, 
Sx r x, 35 wel1 35 1 score computed from the 

latter has the df of (n, + n 2 - 2). 


GLOSSARY 

degrees of freedom: for a statistic under consideration, that number of scores or cases of a sample which can t* 
changed freely without affecting the values of all pre-computed statistics used in its computation 
estimates of the respective parameters. 


Scanned by CamScanner 





SAMPLING STATISTICS 


83 


, B trims formation of raw scores of a variable into such scores as are free of the units of 
||nc „ r «nu.sforn^«» = ^ ^ scores in mea n and standard deviation, but retain the original shape of 

lUC SlriStion of the raw scores. 

,hC . . transformation of raw scores of a variable into such scores as are not only free of 

nonlinear transfo.-n«»f.«i ^ ^ ^ ^ in mean and S D, but also have a distribution different in 

fonnftom that of the raw scores. 

. 1 m trihution : a frequency distribution of a given statistic of different samples around the parameter of 
««** population from which the samples have been drawn. 

,r • the difference between a statistic of a sample and the corresponding parameter of the population 

ttiniphngcr ^ ^ ^ been 

, Hn , statistic : a statistic such as the standard error which goes beyond a single sample to estimate sampling 
Sin ' P 'enors and to make inferences. 

. . crror . „ statistic estimating the differences between the statistics of samples and the corresponding 

S prramete^ of the population from which the samples have been drawn. 

. , crror of differences between means : a statistic to estimate the deviations of differences between 

d sample means from the difference between parametric means of the populations from which those samples 

have been drawn. 

standard error of differences between proportions: a statistic to estimate the deviations of differences between 
sample proportions from the difference between proportions in the populations from which the samples 

have been drawn. 

standard error of mean: a statistic to estimate the differences of sample means from the parametric mean of the 
population from which those samples have been drawn. 

standard error of proportions : a statistic to estimate the differences between proportions of cases of a given 
class in samples from the relevant proportion in the population. 

standard score : a linearly or nonlinearly transformed score, free of the unit of the raw score from which it has 
been worked out, and comparable with other similarly transformed scores for making inferences. 

z score : a linearly transformed score in SD (<j) unit, having a normal distribution if the corresponding raw scores 
have a normal distribution. 




Scanned by CamScanner 



6. probability 

^ 0t rr;« depending 
i0K rpreted or us f 0 f obab ility. Probability 

theoretically on dto* P bu[jons of relative 

distributions a s> computed 

SaWi^oTJdata conforming to orjolaung 

osed 

normal and , distributions are^conttn u 
distributions while binomial and Poisson 
distributions are discrete ones. 


6.1 PROBABILITY 

Probability (P) of the occurrence of an event 
is the limit approached by the relative frequency 
of that type of event in an'infinitely large number 
of observations or trials. If, in a vast set of tna s 
or observations numbering n, the occurrence of a 
given type of event is expected to attain a limiting 
frequency f e , the probability of that event is given 

by: 



But the observed frequency of the event may 
deviate widely from the expected limiting 
frequency if the total n number of events or trials 
is small. Nevertheless, its frequency approaches 
the expected limiting frequency with the increase 
in the number of trials or observations. The 
number of alternative events for each toss of a 
perfect coin amounts to two only, with either the 
head or the tail coming up. But if it is tossed only 
twice (n = 2), either the head or the tail may come 
up both the times in some cases while in other 
cases, both the head and the tail may come up 
once each. On the contrary, if an infinitely large 
number of tosses be performed, the number of 


DISTRIBUTIONS 

times the head (or the tail) comes up win approach 
half the total number of tosses because there ar e 
only two alternative types of events of equal 
weightage. So, the limiting relative frequency 0r 
probability of this event of a given side comi ng 
up is given by V* or 0.5. While choosing a single 
individual at random from a population having 
60% males and 40% females, the probability of a 
male being chosen amounts to «/.oo or 0.6. The 
probability of drawing a sickle cell anemia patient 
at random from a population with 5 % sickle : cell 
anemia cases would similarly come to V.oo or 0.05. 

If die occurrence of an event is not influenced 
by the occurrence or non-occurrence of another 
event and vice versa, the two arc mutually 
independent events ; thus, the second event has 
an unaltered probability of occurrence irrespective 
of whether the first event has occurred or not. 
Events that have an equal probability o 
occurrence are called equally likely events ; in a 
trial, any of such a set of events has the same 
chance of occurring. If two events influence and 
alter one another’s probability of occurrence, they 
are dependent events ; dependent events give rise 
to either clumped or repulsed distributions (§ 
4.10). If the occurrence of one event is prevented 
by the occurrence of another event and vice versa, 
they cannot occur together and are called mutually 
exclusive events ; the probability of the 
simultaneous occurrence of two exclusive events 
amounts to 0 : P (1.2) = 0. Such events, at least 
one of which is sure to occur in a trial, constitute 
a set of exclusive events. 

Addition theorem : 

This theorem states that the probability of 
occurrence of any one of k number of alternative 
and mutually exclusive events is given by the sum 
of the probabilities of their individual occurrences. 


84 


Scanned by CamScanner 













PROBABILITY DISTRIBUTIONS 


85 


w l„r2or...,*) = W + P < 2 > + . 

For example, if one ra. has to be chosen at 
° from amongst 10 rats, the probability of 
^ n rat No.l getting chosen in one attempt is 
a . g|N \ . p(i) = 1/10 = 0.10. The probability of a 
g !' e nd rat No.2 getting chosen in one attempt also 
SCC0 nu m • P( 2) = 0.10. The two choices are 

amounts m . 

mutually exclusive as the choice of one nullifies 
(he chance of the other being chosen by a single 
attempt. In such a case, the probability that either 
the rat No.l or the rat No.2 will be drawn in one 
choice is given by : 


distribution changes the latter into a probability 
distribution which is a distribution of probabilities 
of occurrence of scores, events or cases among 
the classes of the given variable (Table 6.1). A 
probability curve may be graphically plotted with 
the scores of the variable scaled along the X axis 
and their probabilities along the Y axis. In thN 
way, a probability distribution may be computed 
experimentally, using the observed frequencies of 
scores or events in the data of a test or experiment. 

Table 6.1. A probability distribution 
using observ ed frequencies. 


P(1 or 2) = P(l) + TO = 0.10 + 0.10 = 0.20. 

Multiplication theorem : 

This theorem states that the probability of 
combined (simultaneous or successive) 
occurrence of k number of independent events is 
given by the product of the probabilities of their 
separate individual occurrences. 

P( 1.2.3.Jt) = P(l) x P(2) x P(3) x.x P(k). 

For instance, the probability of drawing at 
random any given rat out of 10 rats by a single 
choice amounts to 0.10. Now, if three rats are 
chosen simultaneously or successively at random, 
replacing each chosen rat into the group before 
the next choice, the probability of choosing three 
particular rats, say, rat Nos. 1, 3 and 7, is given 
by: 

P( 1.3.7.) = P(l) x P(3) x P(7) 

= 0.10x0.10x0.10 

= 0 . 001 . 

6.2 PROBABILITY DISTRIBUTIONS 

The relative frequency of scores or events in a 
class interval of a frequency distribution is 
obtained by dividing the frequency of that interval 

the sample size n. If n is very large, this relative 
frequency (fin ) may be an estimate of the 
Probability of scores or events occurring in that 
interval. Computation of probabilities of scores 
0r events for all the class intervals of a frequency 


Class intervals 

/ 

P-- 

n 

60-64 

IS 

0.075 

65-69 

24 

0.100 

70-74 

42 

0.175 

75-79 

78 

0.325 

80-84 

30 

0.125 

85-89 

36 

0.150 

90-94. 

12 

0.050 

Total 

240 (n) 

1.000 


Theoretical probability distributions, on the 
contrary, are computed theoretically on the basis 
of specific mathematical models and laws of 
probability. They are used widely in predicting 
probabilities of events and in testing experimental 
hypotheses. Examples include the normal 
distribution computed on the basis of the Gaussian 
equation. Student’s t distribution based on 
Gossett’s equation, the probability distribution of 
rare events in terms of Poisson’s equation, and 
the binomial distribution based on the binomial 
equation. In any probability distribution, 
probabilities are in a continuous scale with no real 
gaps between them. But the scale for the events, 
cases or scores of the variable under consideration 
may be either continuous or discontinuous. Thus, 
a probability distribution of scores of a 
discontinuous variable (e.g., cell counts, litter 
sizes, family sizes, pulse rates and respiratory 


Scanned by CamScanner 










86 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 



Fig. 6.1. Normal curve showing fractional areas 
between ordinates at different z scores. Note 
that the z score for p is zero. 

rates) or of cases in one of the discrete classes of 
a discontinuous variable (e.g., numbers of males 
or of mutant animals in samples) has real gaps in 
its scale for the scores or cases, and is called a 
discrete probability distribution ; binomial and 
Poisson distributions are such discrete probability 
distributions. On the contrary, a continuous 
probability distribution like the normal or the / 
distribution is a distribution of probabilities of a 
continuous variable with no gap in its scale of 
scores, such as body weights, femur lengths, blood 
sugar or serum iron concentrations and anxiety 
test scores. 

6.3 NORMAL DISTRIBUTION 

Very often a specific form of a bilaterally 
symmetrical, unimodal, bell-shaped distribution, 
called the normal distribution curve , results on 
plotting the frequencies (f) of scores of a 
continuous measurement variable, observed in a 
very large sample, against the respective scores 
(X). A distribution of similar shape is obtained if 
the relative frequencies (fin), obtained by dividing 
each observed frequency by the total frequency 
n, are plotted against the respective standard 
scores (z scores) computed from the raw X scores, 
because z scores are derived by a linear 
transformation of X scores (vide § 5.4); however, 
this distribution is called a normal probability 


* 

, . „ ’ v 

distribution because its Y ordinate 

relative frequencies or probabilities, instead 1 ^ 
observed frequencies, of the respective * ^ 
as also of the corresponding X scores (Fig. 

Properties 

1 . The normal probability distribution given 
probable distribution of scores of a continuous 
measurement variable according to the lawn (J | 
probability. It is thus a continuous probably 
distribution (§ 6.2). 

2. Normal probability distributions can be 
theoretically computed, using the Gaussian 
equation to work out the probabilities of z scores. 
Where e is the base of the natural logarithm and 
amounts to 2.71828, n is the ratio between the 
circumference and the diameter of a circle and 
amounts to 3.14159, z is given by (X- p)/o, and 
Y is the probability of the corresponding z score, 
the Gaussian equation formulated by Karl 
Friedrich Gauss (1777 - 1855) expresses fas a 
function of z and so, of X. 

Y- " c -ix-u?ntt . 


o,. Y= -4- r-- 2 ' 2 . 

Wy>. 

Scaling Y along the ordinate and z along the 
abscissa, and plotting each computed Y against 
the corresponding z score, a normal curve is 
obtained with its peak at the z score for p. 

3. An infinite number of normal curves may 
by plotted depending on the sample size, the mean 
and the SD. But for reference in all cases, the unit 
normal curve is the standard form. It is computed 
taking the sample size (n), the SD (a) and the 
length (r) of the class intervals of the distribution 
as 1.00 each. The Gaussian equation of the unit 
normal curve thus turns into : 


Y= 4-e-? n 
d2n 



Scanned by CamScanner 
















PROBABILITY DISTRIBUTIONS 


87 


4 According to the central theorem oj 
probability' a variable has an almost normal 
distribution if its scores depend on the independent 
effects of many other variables, acting at random 
wd having no interaction among themselves. 
Many characteristics of living organisms have 
normal or near-normal distribution in the 
population, because their scores result largely 
from the effects of numerous genetic and 
environmental factors. 

5. The z score for // is the mean of the normal 
probability distribution and amounts to 0 , because 

It has already been stated above that the SD of 
the unit normal curve is taken as 1 . 00 . 

6 . The ordinate Y o at the mean or centre of the 
normal curve is the highest of all its ordinates. 
For the unit normal curve, this ordinate is 
represented by y 0 which measures 0.3989. The Y o 
of any other normal curve (/t * 1 , a * 1 , t * 1 ) can 
be computed from y 0 . 

v - ‘ e-* 1 ' 2 = , ‘ -<?° = 0.3989 ; 

7 2n V 2 x 3.14159 


7. The normal curve is unimodal and possesses 
a perfect bilateral symmetry around the single 
central peak. This makes the mean, median and 
mode coincide with the centre : ft - Mdn = M Q . 

8 . Because of the bilateral symmetry, the first 
and third quartiles (£7, and Qf) are equidistant 
from ft : fi 3 = ft + 0.6745a ; Q x = M ~ 0.6745a. 
Thus, the range of the normal distribution trom 
-0.6745 a to +0.6745 a covers the middle 0.5000 


coefficient of skewness amounts to zero (§ 6.5). 

10. The normal distribution is taken as a 
standard for the degree of peakedness or kurtosis. 
Il is mesokurtic — its percentile coefficient of 
kurtosis is 0.263 and its moment coefficient is zero 
(§ 6 . 6 ). 

11. The normal curve has asymptotic tails , 
progressively nearing the abscissa or X axis, but 
not reaching the latter except at infinite distances 
from the centre. So, the normal probability 
distribution extends from to +<*>. 

12. The total area of the normal curve 
represents the net frequency n. The latter is taken 
as 1 in case of the unit normal curve ; so, the area 
of the latter is assumed as 1.0000. The ordinate Y 0 
at the mean bisects the area of the normal curve 
into two equal halves ; so, each half of a unit 
normal curve has an area of 0.5000. 

13. The fraction of the total frequency n, lying 
between two specified z scores, is given by the 
area between the ordinates at those z scores and 
is expressed as a fraction of 1.0000 which 
corresponds to the entire area under the normal 
curve (Fig. 6.2). Thus, the proportions of scores 
or cases lying between the mean ju (z = 0) and 
different multiples of a are given by the fractional 
areas between the ordinates at ft and the z scores 
corresponding to the respective a multiples. 

14. Because of the bilateral symmetry of the 
unit normal curve, its fractional area between any 
two given z scores is identical in both halves of 
the curve. Thus, the fractional area between the z 
scores of +1 (i.e., ft + la) and +2 (i.e., ft + 2a) 
is identical with that between the z scores of 
_1 (i.e., ft -1 a) and -2 (i.e., ft- 2a), and amounts 
to 0.1359 (Fig. 6.1). 


F the total area of the distribution (Table 6.2); 15 Because the area in each half of the unit 

le quartile deviation Q covers exactly 0.2500 of normal curve is 0.5000, the fractional area in one 

lat area and amounts to 0.6745a. tail beyond a given z score is gi v ® n J 

. it, HifWnce between 0.5000 and the fractional area 

9. The normal distribution, being bilater y mean n(z = 0) to the given z. This gives 

ymmetrical, is free from skewness lts - 


Scanned by CamScanner 





r 


88 


,.OC]V AND PSYCHOLOGY 




STATISTICS IN niOl 




Fig. 6.2. Some fractional areas of the unit normal curve. 


the probability P of eases having z scores equal 
to or beyond the given z in a single tail; thus, this 
fractional area in one tail beyond the given z gives 
the P of cither (/) the cases with z scores higher 
than and equal to the given +z, or (ii ) (he cases 
with z scores equal to and lower than the given 
-Z. So, for such one-tail probabilities, 

P = 0.5000 - (fractional area 
from the mean to the given z). 

A one-tail critical z score (z a ) is that z score 
beyond which lies a specified fractional area, also 
called the one-tail level of significance (a), in a 
single tail of the unit normal curve. Thus, the one- 
tail critical z 05 is the z score of 1.65, beyond which 
the single tail of the unit normal curve has a 
fractional area of 0.05 (Fig. 6.2b, Tables 6.2 and 
7.J, §7.3 and §7.5). 


16. Because the fractional areas in each hull 
of the unit normal curve equal the corresponding 
ones in the other half, the total fractional area in 
the two tails beyond a given ± z score amounts to 
double the difference between 0.5000 and the 
fractional area from the mean (z ■ 0) to the given 
z in any one of the halves. This fractional area in 
two tails gives the probability Pol the cases having 
z scores equal to or beyond the given z score in 
both halves of the normal curve. So, for such two- 
tail probabilities, 

P = 2(0.5000 - (fractional area from 
the mean to the given z score)!. 

A two-tail critical z score (z u ) is that z score, 
beyond whose positive and negative values lies a 
specified total fractional area (a), also called the 
two-tail level of significance, in two tails of the 



Table 6.2. Some fractional areas of the unit normal curve. 


z scores 

in o units 

Intervals 
ip ±z score) 

Fractional areas 
inside intervals 

Fractional areas and 
probabilities beyond intervals 


in one tail 

in two tails 

0.6745 

J.00 

J.65 

J.96 

2.58 

3.00 

0± 0.67450 

0± Jo- 

Ot I.65o 

0± 1.960 

0± 2.58o 

0± 3o 

0.5000 

0.6826 

0.9010 

0.9500 

0.9902 

0.9973 

0.2500 

0.1587 

0.0495 

0.0250 

0.0049 

0.0014 

0.5000 

0.3174 

0.0990 

0.0500 

0.0098 

0.0027 



Scanned by CamScanner 






















PROBABILITY DISTRIBUTIONS 


HO 


, | cur ve. For instance, the two-tail 
unit is the z score of 1.96, beyond whose 
critical z ° 5 . ne gative values in the two tails lies a 

Cl'frac'ional area of 0.05 (Tables 6.2 and 7.1, 
, 7 3 and § 7.5). 

7 flic normal curve shows upward convexity 
1 ,hc interval ie., between the z scores 

° vcr and _j but becomes concave over both the 
of + b a cy0 nd that interval. The central fractional 
,al S hf'twcen these two z scores amounts to 0.6826 
unit normal curve. Thus, 0.3413 of the total 
' n lies between the ordinates at the mean and at 
the^ score of either +1.00 or-1.00 (Fig. 6.1). 


I g. Means of random samples from a normally 
distributed population form a normal sampling 
distribution with the population mean n at its 
centre (sampling theory of means). 

19 Means of large samples from a population 
with a finite variance (a 2 ) have an almost normal 
sampling distribution around /i, even if the 
variable is non-normally distributed in the 
population (central limit theorem). 


Table A at the end of this book gives the 
ordinates (y) of a unit normal curve at different z 
(i.e., xla) scores as well as the fractional areas in 
one half of the curve from the mean // (z = 0 ) to 
different z scores. 


6.4 BEST-FITTING NORMAL 
DISTRIBUTION 


(n) as the latter. It can be computed as follows. 

(a) The observed scores are arranged in a 
continuous frequency distribution with class 
intervals of equal size (i). 

(b) The midpoint X c of each class interval as 
well as X and s of the sample is computed. 

(c) Each X c is transformed into a z score. 

X f -X 

(d) The unit normal curve table (Table A) is 
used to find the height y of the ordinate of the 
unit normal curse at each computed z score. 

(e) Because i, n and 5 amount to 1.00 each in 
case of the unit normal curve, the ordinate Y of 
the best-fitting normal curve, corresponding to 
each recorded y of the unit normal curve, is then 
computed using the i, n and s ot the given 
frequency distribution. 


Each Y so computed gives the expected 
frequency (J t ) of the best-fitting normal 
distribution, for the class interval whose X c 
corresponds to the z score for that Y. Thus, the 
computed values of Y constitute the distribution 
of f e values for the best-fitting normal distribution 
(Table 6.4). 


It is that normal distribution which fits best (f) Each Y score may be graphically plotted 
with an observed distribution, and has the same against the X c of the corresponding class interval 
mean (X ), the same SD and die same sample size for drawing the best-fitting normal curve. 


Example 6.4.1. 

Compute the best-fitting normal distribution for the following frequency distribution of serum iron concentration 

(Vg dL" 1 ) of 80 humans. 

Class intervals : 100-109 110-119 120-129 130-139 140-149 150-159 160-169 

Frequencies : 6 11 10 17 16 13 7 


12 


Scanned by CamScanner 




90 


STATISTICS 


111 


Solution : . mn ^ ^ Tab j c 63. The interval sizes (0 and the 

(a,ne T ,20 • ,2 ’• 

,24i 

‘ 7L™n (loner score limit of non k>«rr = 120-110 

/ = (lower score limit of an interval) - (lower sewe 

Table 6.3. Table for computing mean and SP of ^ rrurTi tro ° uJU _ 



10 


Class intervals 

/ 

X, 

fx. 


(X.-O* 

A*e~XV 

100-109 

6 

104.5 

bn.o 

- 31.6 

998.56 

5991 jT" 

110-119 

11 

114.5 

1259.5 

-21.6 

466 56 

513216 

120-129 

10 

124.5 

12450 

- 11.6 

134-56 

1345 60 

130-139 

17 

134.5 

2286.5 

- 1.6 

156 

43 52 

140-149 

16 

1445 

23110 

♦ 14 

7036 

1128% 

150-159 

13 

154.5 

20085 

♦ 114 

33136 

4401 28 

160 169 

7 

164.5 

1151.5 

♦ 284 

806 36 

364592 

£ 

80 (it) 


108900 



23688 80 


(/>) Each .V is multiplied by ibtoo(mfNii||/iDghc/Y, aaddto Ml of the Uue aeoiw. viz.. T/X t . it 
used in working oul A. 


F .Me. 


(c) The deviation of each .V from .V u 
f ; the sum of these products is used in computing 50 


ts multiplied by the corresponding 


Izf(K-x ) 1 [mum _ 

*" f~ ; - i * VTorr * ,7J2 « 


(</) The deviation of each .Y f from A is ncxi transformed into z wore *hith i% enlctol m Table 6.4 for 
example, for the class interval 120-129. 


11.6 


X,-X=-I1.6; .= i^-L = J^ = _o.67. 


Thble 6.4. Computation of the best-fitting normal distribution for serum iron data 


Class intervals 


/ 


X c - X 


ref. 


- 


100-109 

6 

104.5 

- 31.6 

-1.82 

0.0761 

33 

110-119 

11 

114.5 

- 21.6 

-1.25 

0.1826 

8.4 

120-129 

10 

124.5 

- 11.6 

-0.67 

0.3187 

14.7 

130-139 

17 

134.5 

- 1.6 

-0.09 

0.3973 

18.4 

140-149 

16 

144.5 

+ 8.4 

+ 0.48 

0-3555 

16.4 

150-159 

13 

154.5 

+ 18.4 

+ 1.06 

03215 

103 

160-169 

7 

164.5 

+ 28.4 

+ 1.64 

0.1040 

48 


Scanned by CamScanner 




















PROBABILITY DISTRIBUTIONS 


91 


(e) Neglecting the algebraic signs of the computed z scores, the height y of the ordinate at each - score is then 
recorded from the unit normal curve table (Table A). For example, for the z score of - 0.67, >• = 0.31S7. 

(f) The height Y of the ordinate of the best-fitting normal distribution is computed for each z score by multiplying 
its y score with in/s. For example, for the class interval 120-129, 


in 

s 


10x80 

17.32 


= 46.19; 


F = >• x — =0.3187x46.19 = 14.7. 
s 


The computed Y scores correspond to the expected frequencies (f e ) in the respective class interv als of the 
best-fitting normal distribution. 


Example 6.4.2. 

Work out the best-fitting normal distribution for the frequency distribution of anxiety test scores of 255 high- 
school girls presented in the first two columns of Table 6.5. 

Solution : 

(a) The midpoints ( X c ) and the size (i) of the class intervals arc worked out. For example, for the class interval 
27-31, 

X c = ‘/ 2 [(upper score limit) + (lower score limit)) = Vi<31 + 27) = 29. 
i = (lower score limit of an interval) - (lower ^core lirr.it of the next lower interval) = 32 - 27 = 5. 


Table 6.5. Table for computing mean and SD of anxiety score data. 


Class intervals 

/ 

K 

A. 

X c - X 

(X -X) 2 

f(X c -X) 2 

52-56 

2 

54 

108 

+ 29.6 

876.16 

1752.32 

47-51 

5 

49 

245 

+ 24.6 

605.16 

3025.80 

42-46 

6 

44 

264 

+ 19.6 

384.16 

2304.96 

37-41 

6 

39 

234 

+ 14.6 

213.16 

1278.96 

32-36 

21 , 

- 34 

714 

+ 9.6 

92.16 

1935.36 

27-31 

54 

29 

1566 

+ 4.6 

21.16 

1142.64 

22-26 

63 • 

24 

1512 

- 0.4 

0.16 

10.08 

17-21 

53 

19 

1007 

- 5.4 

29.16 

1545.48 

12-16 

33 

14 

462 

- 10.4 

108.16 

3569.28 

7-11 

11 

9 

99 

- 15.4 

237.16 

2608.76 

2-6 

1 

4 

4 

- 20.4 

416.16 

416.16 

I 

255 (n) 


6215 

_j- 



19589.80 


(!>) Each X c is multiplied by the corresponding / to give fX c and the sum of the latter scores for all the 
intervals, viz., lfX c , is used in working out X . 


- 6215 

A " n 255 


24.4. 


Scanned by CamScanner 
















92 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


(c) The difference between each X c and X is squared and the squared difference is multiply b 
corresponding/; the sum of these products is used in computing SD. ' lflc 

(J) The difference between each X c and X is next transformed into ; score which is entered in Table 6.6 F 
example, for the class interval 27-31, 


X c -X =29-24.4 = + 4.6; z = = + 0.51 

(c) Neglecting the algebraic signs of the computed z scores, the height y of the ordinate at each of them is then 
recorded from the unit normal curve table (Table A). For instance, for the ; score of 0.52, y = 0.3485. 

(/) The height Kof the ordinate of the best-fitting normal distribution is computed for each j score by multipl vinp 
its y score with in/s. For example, for the class interval 27-31, % 


/ = 5; n = 255; r = 8.78; ^=1^=| 4 5.2; 


m 


Y = yx — = 0.3485 x 145.2 = 50.6. 


JmS 10 ,,,C " «> " cto intervals 


of the 



47-51 

42-46 

37-41 

32-36 

27-31 

22-26 

17-21 

12-16 

7-11 

2-6 


5 

6 
6 

21 

54 

63 

53 

33 

11 

1 


49 

44 

39 

34 

29 

24 

19 

14 

9 

4 


+ 24.6 
+ 19.6 
+ 14.6 
+ 9.6 
+ 4.6 

- 0.4 

- 5.4 

- 10.4 

- 15.4 

- 20.4 


+ 2.80 
+ 2.23 
+ 1.66 
+ 1.09 
+ 0.52 

- 0.05 

- 0.62 
- 1.18 

- 1.75 

- 2.32 


0.0079 

0.0332 

0.1006 

0.2203 

0.3485 

0.3984 

0.3292 

0.1989 

0.0863 

0.0270 


1.1 

4.8 

14.6 
32.0 

50.6 

57.8 

47.8 

28.9 
12.5 

3.9 


6.5 SKEWNESS 

In some frequency distributions, scores are 
concentrated at one end of the scale and are much 
fewer towards the other end. Such an asymmetric 


fx_ . wi inoae towards the 

former end and a longer and more po.nred mi a 

^ other end. Such a distriburion is called a 

keWed d '*"-‘bmion. Skewness is the degree of 
asymmetry of the distribution. 



Scanned by CamScanner 

















PROBABILITY DISTRIBUTIONS 


93 



Negative skewness 



Mo\X 

Mdn 


Positive skewness 


(e) Unlike symmetrical distributions where the 
first and third quartiles (Q 3 and Q,) are equidistant 
from the second quartiie (Q, or Mdn), is 
displaced towards the skewed tail in an 
asymmetric distribution. Therefore, (@ 3 - (),) > 
(C ?2 ~ Q\) ' n positively skewed distributions, 
and (Q, - Q,) > ( Q 3 - Q 2 ) in case of negative 
skewness. 


Fig. 6.3. Skewed distributions. 

Properties of skewed distributions 

(a) A skewed distribution cannot be bisected 
into two symmetrical halves, because one of its 
tails is longer and more tapering than the other. 

(b) Skewness may be positive or negative 
according as the pointed longer (ail rolls down to 
the high-value (right) end or the low-value (left) 
end of the scale, respectively (Fig. 6.3). The scores 
are more concentrated in the respective opposite 
halves of the distribution — agcwisc frequency 
distributions for atherosclerosis, cataract, 
malignancy and prostatic enlargement arc 
negatively skewed as their incidences rise in old 
age. 

(c) Mean, median and mode fail to coincide in 
an asymmetric distribution. Both median and 
mean are displaced from the mode towards the 
skewed tail, but the displacement of the mean 
considerably exceeds that of the median. So, 

X > Mdn > M o in Positively skewed distributions 
while M o >Mdn > X in negatively skewed ones. 
Because the deflection of X exceeds that of Mdn, 
the latter is more dependable of the two as a 
measure of central value in a skewed distribution. 

(d) Unlike symmetrical distributions where all 
order central moments possess a zero value, 

Hij and higher odd-order central moments have 
di er P os ' tive or negative values in skewed 
ind-H Ut *° nS ’ l ^ e ’ r s *§ ns an d magnitudes 
of skew 8 reSpeCUVely lhe direction and the degree 


Measures of skewness 

(a) Pearson's first coefficient of skewness : 


(b) Pearson s second coefficient of skewness : 

^ _ 3(A’ - Mdn) 
s 

The second coefficient is preferable to the first 
because of the frequent difficulty in estimating 
the mode of a distribution precisely. In symmetric 
distributions, X = Mdn = Mo ; so, both the 
Pearson coefficients amount to zero in such 
distributions. 

(c) Bow fey's quartiie coefficient of skewness : 

ci- <fr-g2>-(£-ft) 

<&-£,) 

In any symmetric distribution, Q y - Q 2 = 

Q 2 - Q\ ‘, so, the quartiie coefficient amounts to 
zero in such distributions. 

(d) Moment coefficient of skewness (y { ) : 



where the third moment m 3 about the mean is 
divided by the third power of SD ; because nu 
amounts to 0 for symmetric distributions, y 
amounts to 0 for such distributions. 

All four coefficients are free from the unit of 
the variable. A positive or negative value indicates 
respectively a positive or negative skewness. The 
degree of skewness is given by the magnitude of 
the coefficient. 


Scanned by CamScanner 






















94 


Example 6,5.1. 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Compote (lie qunrtHe coefficient of skewness for the frequency distribution of human body weights where 
Q r (f 2 nnd amount respectively to 57.4. 60.3 and 62.8 kg. 


Solution: 


0, = 57.4 kg ; Q } = 62.8 kg ; Q 2 = 60.3 kg. 
... sk m (gj -0?)“(0*~gt) _ (62.8-60.3)-(60.3-57.4) 


(ft-C?,) 


618-57.4 


= -0.074. 


Example 6.5.2. 


of ApUlude Tkst 


Solution : 


^=98.69; 14.08; AM/i a 98.25. 

. ^_3(X-Mdn) 3(9869-98.15) 


14.08 


♦ 0.094. 


6.6 KURTOSIS 

Kurtosis denotes ihc degree of peakedness of 
a frequency distribution, compared to that of the 
normal distribution. The normal distribution is said 
to be mesokurtic and its pcakedness of a medium 
order is taken as the standard (Fig. 6.4). A 
leptokurtic distribution has a higher and sharper 
peak, thicker tails and a narrower body than the 
normal distribution. Thus, a leptokurtic 



Fig. 6.4. Different forms of kurtosis. 



distribution has higher froqoeoc ies of scores near 
its centre and at its two tails than a normal 
distribution with the same mean and variance, but 
has lower frequencies of scores of intermediate 
magnitudes. Student’s / distributions are generally 
leptokurtic unless the sample is very large. A 
platykurtic distribution is flatter at its centre 
broader in the body and thinner at the tails than 
the normal distribution because compared to the 
atter, the former carries lower frequencies of 
scores near its centre and at its tails, but higher 
equencies of scores of intermediate magnitudes. 

Measures of kurtosis 

(a) Percentile coefficient of kurtosis (tc): 

ie- dll ~ ^25 • Q 

“ 2(/? K) -P I o) : oi;jr= P 90 -Pm 

For mesokurtic, platykurtic and leptokurtic 
distributions, k amounts respectively to 0.263, 

> 0.263, and < 0.263; the higher the value above 
0.263, the greater is the platykurtosis while the 
lower it is below 0.263, the greater is the 



Scanned by CamScanner 

















PROBABILITY DISTRIBUTIONS 


95 


j e tokurtosis. Because the normal distribution is 
esokurtic, K amounts to 0.263 for it. On the 
contrary, t distributions are leptokurtic and have 
K values lower than 0.263. 

(b) Moment coefficient ofkurtosis (y 2 ): 


r 2 = 



-3; 



where r is the SO. and m, and m 4 are respectively 
the second and fourth central moments. For 
mesokurtic, platykuritc and leptokurtic 
distributions, y 2 amounts to zero, a negative value 
and a positive value, respectively. Thus, y 2 
amounts to zero for the mesokurtic normal distri¬ 
bution, but has positive values for leptokurtic 
t distributions. 


Example 6.6.1. 

Compute the percentile coefficient ofkurtosis for a frequency distribution of Differential Aptitude Test scores, 
having a 10th percentile of 80.60, a 25th percentile of 88.89, a 75th percentile of 10S. 15 and a 90th percentile of 

116.03. 


Solution : 


P w = 80.60 ; P 2 j = 88.89 ; />„ = 10S. 15 ; 

_ 108.15-88.89 

*’* K ~ 2(/ , 90 -/>,<,) " 2(116.03-80.60) 


P w = 116.03. 
= 0.272. 


As vexceeds 0.263, the distribution is slightly pi \> irtu 


6.7 STUDENT’S t DISTRIBUTION 

The probability distribution of the scores (X) 
of a small sample (;t < 30), drawn from a normally 
distributed population, differs from the normal 
distribution, but conforms to a probability 
distribution called the Student's t distribution. The 
latter was originally derived in 1907 by W.S. 
Gossett writing under the pseudonym of 
“Student”. The t distribution is the probability 
distribution of the statistic t which, like z, is a 
standard score obtained by the linear 
transformation of any raw score (X) by dividing 
its difference from the mean with the SD. 


o 

A sample statistic such as the sample mean 
thay also be linearly transformed into a t score by 
dividing the difference between the statistic and 
the corresponding population parameter, with the 
#of that statistic. 


G Y 

Like the scores of a small sample, the statistics 
of small samples, drawn from a normally 
distributed population, have probability 
distributions comforming to t distributions. 

The difference between two sample statistics 
can also be transformed into a t score by dividing 
the deviation of that difference from the difference 
between the respecting parameters, with the SE 
of the difference between those statistics. Where 

Xj and X 2 are the means of two samples drawn 
from two normally distributed populations having 
/ij and H 2 as the respective parametric means, 
cr v v is the SEof the difference between X, and 

Xt, S is the pooled or common SD of the two 
samples, and /i| and n 2 are the respecti\e sample 
sizes, 


Scanned by CamScanner 






96 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 




or, t = — 1 - T — 



If the samples have been drawn from the same 
population, 


»\=V 2 ‘, 


/ = 


a *r*> 



The probability distribution of such 
differences between two sample statistics also 
conforms to the t distribution. 

Properties 

1. / distributions arc continuous probability 
distributions for small samples. The statistic t is 
thus a small-sample statistic. 

2. t distributions can be worked out 
theoretically using the Gossett's equation. Where 
Y o is a constant depending on the degrees of 
freedom (df) of the given t score. Y is the 
probability of random occurrence of that t score, 
#is the constant ratio between circumference and 
diameter of a circle, the Gossett equation 
expresses Y as a function of t in the following 
manner. 

y _ [(<//-!)/2]! 1 
• Mr-2)/2i! 


r 


The theoretical t distribution can be graphically 
drawn as a t distribution curve by plotting the 
computed Y scores, scaled along the ordinate axis, 
against the respective t scores scaled along the 
abscissa axis (Fig. 6.5). 


Fig. 6.5. Some t distributions for different 
degrees of freedom. 

3. The / distribution forms a bilaterally 
symmetrical curve around a single central peak 
and possesses no skewness. The mean, median and 
mode coincide with the centre. Thus, the t 
distribution has a mean of 0 «hkh is the t score 
for fL The standard deviation of the distribution 

■mountsto Jn / (n - 2), where (n-2) is a positive 

integer. 

4. There arc different t distributions for the / 
scores having different degrees of freedom. With 
the rise in df (and hence, in the sample size), the / 
distribution progressively approaches the normal 
distribution in shape — there are negligible 
differences between the two distributions when n 
equals or exceeds 30, and almost no difference 
when n approaches «>. For large samples, the 
equation for t distribution resembles that for the 
unit normal curve : 



Hence, though a small-sample statistic, t is 
applicable even to large samples because of the 
close identity of normal and t distributions for 
large samples. 

5. In contract to the normal distribution, t 
distributions are leptokurtic, having percentile 
coefficients of kurtosis lower than 0.263. 




Scanned by CamScanner 

















PROBABILITY DISTRIBUTIONS 


97 


Leptokurtosis rises with the fall in the df and 
hence, with the reduction in the sample size. When 
(//approaches infinity, the decline in leptokurtosis 
causes the t distribution to become mesokurtic and 
to coincide with the normal distribution. 

6. Like the normal distribution, t distributions 
are asymptotic with their tails extending to -oo 
and +°°- 

7. Because probability distributions are 
distributions of relative frequencies, the total area 
under any t distribution curve, representing the 
total frequency n , is taken as 1.00. 

8. The fractional area of a / distribution curve 
between the ordinates at any two t scores on the 
X axis gives that proportion of the total cases 
which falls between those given t scores. However, 
as / distributions vary in shape according to the 
df, such a range between the / scores as includes a 
specific fractional area like 0.95, varies with the 
df 


6.8 CONFIDENCE INTERVALS 

The confidence interval of a parameter is the 
range of scores within which the parameter has a 
given probability of lying. This probability is 
called the fiducial probability. The two scores 
(L, and L,), forming respectively the lower and 
upper limits of the confidence interval, are called 
its lower and upper confidence (fiducial) limits. 
The fiducial probability is called the confidence 
level when expressed as a percentage, and gives 
the degree of confidence in expecting the 
parameter to lie in the given confidence interval. 
The confidence interval is an interval estimate of 
the parameter and is more dependable than a point 
estimate (page 11). 

Confidence interval for mean 

Confidence limits arc computed for a 
parametric mean (p) using the critical z or t score 
and the standard error (0"jf) of the mean. 

f 

(a) Using : scores : 


9. When a t score has been computed for live 
difference between two sample means, the sum 
of the fractional areas of the relevant t distribrn ion 
in its two tails, beyond respectively the negative 
and positive values of the computed r, gives the 
probability P of obtaining by mere chance the 
observed difference between the sample means, 
irrespective of the algebraic sign of the difference. 

10. A critical t score (t^ is the highest t score 
with a given df, upto which the observed results 
(e.g., an observed difference between two sample 
means) have a specified probability P of occurring 
by mere chance. It is also the t score beyond which 
lies that specific fractional area in the tail(s) of 

df, as equals the 
7.3 and 7.5, Table 

The r distributions have wide applications in 
H'ding the significance of the observed results 
computing the confidence intervals of 
Population means (See Table 6.7). 


me t distribution for a given 
chosen significance level (a) (§ 


Means (X ) of large samples (n £ 30) have an 
almost normal sampling distribution around // 
irrespective of whether or not the variable has a 
normal distribution in the population, provided the 
latter has a finite variance (page 83). So, 95% of 
the sample means lies in the interval p ± 1.96 
where 1.96 is the z score at the tail end of the 



Fig. 6.6. Normal curve showing fractional areas for 
some confidence intervals. 


13 


Scanned by CamScanner 













STATISTICS IN BIOLOGY AND PSYCHOLOGY 

Table 6.7. Comparison of normal and i distributions. 

- rr —-i_.* ~ T Student’s t distributions 

Normal distributions____- 1 


1. Continuous probability distributions for scores of 
continuous variables. 

2. Can be worked out theoretically using the Gaussian 
equation. 

3. Unimodal distributions. 

4. Asymptotic tails, progressively nearing the abscissa 
or X axis, but not reaching the latter before or 

+ oo. • 

5. Mean of the distribution amounts to 0.00 which is 
the z score for p. 

6. The ordinate at the mean (z = 0.00) is the highest 
of all the ordinates, indicating the probability of 
the highest number of occurrences of this score — 
thus, the mean coincides with the mode of the 
distribution. 

# 

7. Perfect bilateral symmetry — no skewness : 
skewness coefficients have zero values. 

8. The ordinate at the mean (z = 0.00) bisects the 
distribution into two halves of equal areas and 
identical shapes — the mean thus coincides with 
the median. 

9. Distributions vary with sample size (n), SD (a) 
and class interval size (i). 

10. A unit normal curve, computed by taking n, o and 
i as 1.00 each, is used for reference for all samples. 


11. Mesokurtic distributions, having a medium degree 
of kurtosis (peakedness) and a percentile coefficient 
of kurtosis amounting to 0.263. 


12. Applicable for finding probabilities of random 
occurrences of scores in large samples drawn from 
normally distributed populations. 




1. Continuous probability distributions for im, 

. • ui„ "WCH of 

continuous variables. 1 

2. Can be worked out theoretically using the Goy tellr 
equation. 

3. Unimodal distributions. 

4. Asymptotic tails, progressively ncanng the ab*cim a 
or X axis, but not reaching the latter before -« 0 r 
+ •». 

5. Mean of the distribution amounts to 0.00 which o 
the / score for//. 

6. The ordinate at the mean (I ■ 0.00) is the highest 
of all the ordinates so that the mean coincides with 
the mode of the distribution. 


7. Perfect bilateral symmetry — no iktwness ; 
skewness coefficients have zero values. 

8. The area of the distribution is divided into two equal 
halves of identical shape by the ordinate at the mean 
(/ * 0 . 00 ) — the mean thus equals the median 

9. Distributions vary with the degrees of freedom of i 
scores, approaching the normal distribution with 
the rise in df. 

10. Any computed t score has to be referred to the 
particular t distribution for the specific df of the 
computed t. For large samples, however, the 
equation for t distribution resembles that for the 
unit normal curve. 

11. Leptokurtic distributions with sharper peaks, 
narrower bodies and thicker tails than mesokurtic 
distributions, and having percentile coefficients of 
kurtosis below 0.263 — leptokurtosis declines with 
the rise in df. 

12. Applicable for finding probabilities of random 
occurrences of scores in small samples drawn from 
normally distributed populations ; applicable in 
case of large samples also, because of close identity 
of normal distribution with / distributions for large 
samples. 


Scanned by CamScanner 








PROBABILITY DISTRIBUTIONS 


99 


iVnctional area of 0.4750 in each half of (lie unit 
normal curve (.Fig. 6.6). The probability is, 
therefore. 0.95 that the difference (X ~p) ranges 
between -l .96oy and +1.96cr^. So, using the 
unbiased sy of the sample, the fiducial probability 
amounts to 0.95 that p lies in the interval X ± 
l.96.ty« 

Similarly, 99% of the sample means lies in the 
interval // ± 2.58(7^ where 2.58 is the z score at 
the tail end of the fractional area of 0.4951 in each 
half of the unit normal curve (Fig. 6.6). So, the 
fiducial probability amounts to 0.99 that p lies in 

the interval X ± 2.58 sy. 

Thus, for the fiduciul probability of (1 - a), 
the two confidence limits arc worked out as 
follows using a random sample drawn with 
replacement : 

L, = X - z a sy ; L 2 = X + z a sji ; 

where z a is the critical z score, beyond which lies 
a fractional area of or/2 in each tail and a total 
fractional area of a in two tails of the unit normal 
curve — the fractional area from ft to z a evidently 
amounts to (1 - a)/2 in each half of the normal 
curve. Thus, for a fiducial probability (1 - or) of 
0.95, 

L, = X - l.%sy ; L 2 =X+ 1.96 sy ; 

similarly, for a fiducial probability of 0.99, 

— X 7 » 2.58 sy ; L-, = X + 2.58rj-. 

Using a sample drawn without replacement 
from a finite population of size N, 


L \~ x 1 : 


, _ tt _ ILll 

Lj A + ZgSy y j • 




Fig 6.7 t distribution showing fractional areas 
(1 a) for (A) 95% and (B) 99% confidence 
intervals. 

(b) Using t scores : 

Irrespective of the sample size, the confidence 
limits for a fiducial probability of (1 - or) can be 
computed using the critical t score with the given 

df. 

L \ = X ~ l aidf) S X ; L 2 = X + t a(df) Sy ; 

where t a{df) is the critical t score, beyond which 
lies a fractional area of a/2 in each tail and a total 
fractional area of a in two tails of the t curve with 
the given df ; the fiducial probability equals the 
area (1 - or) at the centre of the t curve between 

-'<*!f> and+ W> (Fi g- 6J) - 

The higher the fiducial probability (1 - or), the 
wider is the confidence interval and the greater is 
the probability of p falling in that interval, but 
the lower is the precision of the estimate for the 
true value of p. 



Scanned by CamScanner 












100 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Example 6.8.1. 

The mean and SD of blood lactate concentration amounted respectively to 11.9 and 1.76 mg dL _, 
male college students. Find the confidence limits of the population mean at . an .99 fiducial probabj^^ 

'ey 

Solution : 

1 76 

X = 11.9mg ; J= 1.76mg; n = 9; sy = -j- = = 0587 ; <# = « - 1 = 9 - j e 8 

(a) 0.99 fiducial probability : 

I - or= 0.99; er= 0.01; t a(4f) = r 01(8) = 3.355 (Table B). 

L x -X- t 0l(8) Sjf = 11.9- 3.355 x 0.587 = 9.93 mg ; 

L 2 =X+ r 0I(g) ij- = 11.9 + 3.355 x0.587 = 13.87 mg. 

(b) 0.95 fiducial probability : 

l-a= 0.95 ; /. nr= 0.05 ; t^ = r 05(8) = 2.306 (Table B). 

L, a X - t 0 5 (g) iy = 11.9- 2.306 x 0.587 = 10.55 mg ; 

= X + = 11.9 + 2.306 x 0.587 = 13.25 mg. 


Example 6.8.2. 

Hie mean and 8Do( the scores obtained by 256 children in a digital-symbol learning test were found to be 82 
anc 32 respectively. Find the fiducial limits of the population mean at 99% confidence level. 

Solution : 

Y =82; 5 = 32; „ = 256; sy = - 4 - = -2L = 2.0. 

Vfl V256 

1 - a = 0.99 ; (I - a)/2 = 0.4950. 

The z score, having fractional area of 0.4950 between it and //, amounts to 2.58 (Table A). 

L, = X -z a sy = 82 - 2.58 x 2 = 76.84 ; 
h ~ X + Z a sy = 82 + 2.58 x 2 = 87.16. 


6.9 BINOMIAL DISTRIBUTION 

If a population is divided into two classes with 
respect to a variable, the probabilities of 
occurrence of different combinations of events or 
individuals of the two classes in a sample, drawn 
from that population, are given by the series of 
.terms of the binomial equation. Thus, the 
distribution of the probabilities of occurrence (the 
relative expected frequencies) of events or 


individuals of either class can be computed 
theoretically from the binomial equation and 
constitute a binomial probability distribution. 

Computation of binomial probabilities 

(a) Using the binomial expansion directly: 

Where n is the total number of events or cases 
in the sample, p and q are the proportions of the 
population in the two classes, and ^equals (1 — p). 



Scanned by CamScanner 











PROBABILITY DISTRIBUTIONS 


101 


the binomial expansion may be written as follows: 

. . n(n- 1) - , 

(p + qf = P n + 4 + 1x2 P""V 

n(n-\)(n-2) , 

+ . 


n(n- l)(n-2). x3x2 

+ 1 x2x3x .x(n-l) pq "~ +qn ‘ 


combination is thus given by the Bernoulli 
expansion : 


P(X) = 


n\p x q'- x 
X! (« —X)! 


[n(n - l)(n - 2).x 3 x 2 x 1] p x q n ' x 

[X(X - 1)(X - 2).x 2 x l][(n - X).x 2 x 1] 


In this expansion, each term such as p n , 
np n '^q and cf gives the relative expected frequency 
or probability of occurrence of that specific 
combination of events or cases of two classes as 
is given by the respective powers of p and q in 
that term. The coefficients such as n, 
n(n - 1)/(1 x 2), etc., of these terms are the 
binomial coefficients which can be directly 
obtained from the Pascal’s triangle. 

If, say, a sample of 10 individuals is drawn 
from a population with 0.55 and 0.45 proportions 
of cases in its two classes, the probability ( P ) of 
the random occurrence of a combination of 8 cases 
of the p class and 2 cases of the q class in the 
sample is given by the binomial expansion as 
follows : 

n = 10; p = 0.55 ; q = 0.45 ; 

cases of q class = 2 ; 
cases of p class = n - 2 = 10-2 = 8; 


10x(10-l) 

1x2 


(0.55) 8 x (0.45) 2 


= 0.076. 


(b) Using the Bernoulli expansion : 

The probability of a given combination of cases 
of the two classes (or of a given number of cases 
in one of the classes) in a binomial distribution is 
also given by the Bernoulli distribution formulated 
by Jacques Bernoulli (1654-1705), a Swiss 
mathematician. Where the given combination 
consists of X and (n - X) numbers of cases of 


For the preceding example, 

10! (0.55) 8 x(0.45) 2 
8 ! ( 10 - 8 )! 

[10 x9x. x3x2x l] x(0.55) 8 x(0.45) 2 

[8x7x.x3x2x l][2x 1] 

= 0.076. • 

If the probability P whether worked out using 
the binomial expansion directly or using the 
Bernoulli expansion, is either equal to or lower 
than 0.05, it is generally inferred that the given 
combination may have occurred in the sample 
owing to reasons other than random sampling 
(P ^ 0.05). But if P exceeds 0.05, it may be 
inferred that there is a high probability of the given 
combination occurring in the sample by mere 
chances of random sampling itself. 

The binomial distribution of the probabilities 
of events of either class may be represented by a 
bar diagram (Fig. 6.8). Numbers of events of the 
specified class are scaled along the X axis and 
form the midpoints of the bases of the bars ; 
heights of the bars are determined by the relative 
frequencies (probabilities) scaled along the Y axis. 
Because the events occur in whole numbers only, 
the bars are separated by intervening gaps to 
indicate the discrete nature of distribution. 

A binomial distribution may also be 
represented by a frequency polygon (Fig. 6.9), 
plotting the relative frequencies (probabilities), 
scaled along the Y axis, against the respective 
number of events scaled along the X axis. 


respectively p and q classes, and n\, X! and 
(n - X)! represent the factorials of n, X and 
(n - X) respectively, the probability P(X) of that 


In problems like those involving sex-ratios, 
fertility rates, mortality rates, infection rates. 


Scanned by CamScanner 




















STATISTICS in B.OLOOY AND -SYCHOtOOV ^ 


102 

nirimcnts and psychological 

the probability of events. 

The absolute expected frequency (f) °^8 ,VCn 
combination in a certain number ofc 
trials is obtained by multiplying the a a t 
expected frequency or probability, computed as 
above for a sample or trial, by the total numbe 
(it) of such samples or trials used. In the preceding 
example, the absolute expected frequency of the 
combination of 8 cases of p class and 2 cases of q 
class in 100 such samples of size 10 each, is given 
by: 

f e (X) = kP(X) = 100 x 0.076 = 7.6. 

Properties 



. 3 5 7 

Number of event* 

Fig. 6.9. Frequency polygon* of binomial 
distributions of events of the class vvii), 
the proportion p in the population 


1. The binomial distribution is a discrete 
probability distribution. It gives probabilities of 

whole numbers (0,1,2,3, n) of events or cases 

of a class; because there cannot be any fractional 
occurrence of an event, the whole number of 
events form a discontinuous scries with 
intervening gaps between them. This makes the 
distribution discrete. 

2. It is the probability distribution of 


0.2 n 



Number of events 


• .~...xsw ui cvenu 

Fig. 6.8. Bar diagram of a binomi, 

of events belonging to » 
the proportion pin thep 


dichotomized variables divided into only two 
classes according to (he presence or absence oI 
some property. Such variables include sex, Illy 
positive-negative, success-failure, yes-no answer*, 
and pregnant-nonpregnant. 

3. Binomial probability distributions can be 
worked out theoretically on (he basis of tbe 
binomial equation (sec above). So, the probability 
of a given number of cases in cither class, or of a 
given combination of cases in both classes of the 
variable, can be obtained from a specific term 
of the binomial expansion. 

4. Neither of the two classes consists of rare 
cases so that neither of their respective 
proportions, p and q, in the population is too much 
lower than 0.50. 

5. The events or cases of either class occur at 
random and are independent of each other. 

6. Mean (//), variance (o 1 ), SD (a) and the 
coefficient of dispersion (CD) of a binomial 
distribution of cases of the class having the 
proportion p in the population, are obtained from 
the sample size (n) and the proportions (p and q) 
of the cases in the two classes. 

P-np\ o 1 - npq ; o - \ f"Pq - 

n i 


Scanned by CamScanner 





















PROBABILITY DISTRIBUTIONS 


103 


- Skewness and kurtosis of binomial 
retribution depend on the proportions (p and q) 
^fdie two classes in the population. 

(<j) Tbs moment coefficient (yj) of skewness 
-,f a binomial distribution of the class having the 
\eportion p in the population, is given by : 

P v _ V-P 

' Iwi 

jhus the distribution is bilaterally symmetrical 
and has no skewness when p = q = 0.50. But it has 
a negative skewness if p > 0.50, and its left or 
low-value tail is more drawn out than the right 
one. On the contrary, it has a positive skewness 
with a longer right tail when p < 0.50. 

(b) The binomial distribution is platykurtic so 
long as p lies between about 0.2114 and 0.7886. 
but turns leptokurtic if p is either below 0.2114 or 
above 0.7886. Its kurtosis is given as follows by 
the moment coefficient (y 2 ) of kurtosis : 

_ 1-6 pq 
72 npq 


Assumptions 

For applying the binomial distribution to the 
observed results, it should be justifiable to assume 
that: 

(a) the population is dichotomized with an 
intervening gap between the two classes ; 

(b) the events or cases of each class occur at 
random and independent of each other in the 
sample so that occurrence of one event does not 
influence the probability of occurrence of any 
other event; 

(c) the proportions of cases in the two classes 
of the population have remained unchanged 
during sampling and arc known with reasonable 
accuracy ; 

(d) the mean and variance of the distribution 
of events of the relevant class (p) are given by the 
following : fi-np\ a 2 - npq. 

(e) the coefficient of dispersion (CD) of the 
events of one class equals the proportion of cases 
in the other, and falls short of 1 : CD = q < 1.00. 


Example 6.9.1. 

Work out and interpret the binomial probability of random occurrence of 8 male cockroaches in a sample of 
10 drawn from a cockroach population with a male i female vatio of 45 t 55. Also find the absolute expected 
frequency of occurrence of a similar combination of males and females in 100 such samples of 10 cockroaches 

each. 


Solution : 

proportion (p) of males = = 0.45 ; proportion (q) of females = jqq = 0.55; 

sample size (n) = 10 ; number (X) of males in sample = 8 ; number (fc) of samples = 100. 
(a) Using the binomial expansion : 


<P + qY l = ff + np"- l q + V + 


+ 


n(n - l)(n - 2). x3x2 

Ix2x3x.— x(n-l) 


P<T X + f- 


°r. (0.45 + 0.55) 10 = (0.45) 10 + 10 x (0.45) 9 x 0.55 + x (°- 45 >* x (°- 55 ^ + 

10 x9x8x . x3x^ x Q 45 x (0-55) 9 + (Q.55) 10 . 

1x2x3x.....x8x9 


Scanned by CamScanner 















104 



STATISTICS IN BIOLOGY AND PSYCHOLOGY 

^ . hiIitv P(g) of the combination of 8 males and 2 females in a sample is given by the third term of the 

jrttr ?£ 2 as the .speedve p— 

/>(8) = 10x ^ X (0.45) 8 X (OSS) 2 = 0.023. 

1 ^ 


(b) Using the Bernoulli expansion : 


PiX) -«l£L 

HW ~ X'.(n-X)'.' 


or, 


10!(0.45) 8 x (0.55) 2 (10x9x..... x2xl) (0.4m s i0.?: _ 

’.^(8) =-8! (10-8)! " (8x7x_x2xIX2xl) 


As the binomial probability P is found to be lower than 0.05. it maybe inferml that die frm Mnberof 8 
males has occurred in the sample of 10 owing to reasons other than m sampling 

. • • _a * - - * — !•» I nnmhpf n! 


tales has occurred in the sample or tu owing io reruns — —-- - 

Absolute expected frequency (f t ) of the similar co oJ li Jfio o duoks wd females in k number of such tampfei 
(k = 100) is given by : 


ft 8) = kP( 8) = 100 x 0.023 = IS. 


Example 6.9.2. 

Find and interpret die pcohahluy of random occurrence of 7 foitre cases in a sample of 10 drawn from a 
population with 40% incidence of endemic goitre. What is the abaofctfc eapected irequency of occurrence of a 
similar combination of goitre and nongoitie cases in 250inch sample* of 10 tndivithiels eac 


Solution : 

40 

proportion (p) of goitre cases = = 0.40; 

proportion (q) of nongoitre cases = 1.00 - 0.40 = 0.60. 
sample size (n) = 10 ; number of goitre cases (X) in sample = 7 ; number (k) of sample = 250. 
(a) Using the binomial expansion : 


(p + q) n = ? + np n ~ l q + 


n(n-l) 

1x2 


P*~ 2 <f + 


n(n-lXn-2) 

1x2x3 


p'*“V + 


+ 


n(n - l)(n - 2) *3x2 

Ix2x3x .(n-2)(n-l) 


P<T X + 


or. (0.40 + 0.60)" = (0.40) 10 + 10(0.40) 9 x 0.60 + 


10x9 

~UT x x (0.60) 2 


10x9x8 
+ 1x2x3 


x (0.40) 7 x (0.60) 3 +..... + t0x9x8x_.x3x2 

1 x 2 x 3 x.— x 8 x 9 


x (0.40) x (0.60) 9 


+ (0.60)'°. 



Scanned by CamScanner 


















PROBABILITY DISTRIBUTIONS 


105 


fV 


l, n ,Brtl*ilUy IV) of ihe combination of 7 goitre and 3 nongoitre cases is 
pantioo. which has 7 atul 3 as the powers of p and r/, respectively. Thus. 


given by the fourth term of the 


IV) ■ X (0.40; 7 x (0.60) 1 = 0.042. 


0 ) Minx the Bernoulli expansion : 

n\p*r* 

K® m y\ln-X)\' 

I0I(0.40) 7 x(0.«)) 3 (10x9x .x 2x l)x(0.40) 7 x(0.60) 3 _ nn „ 

or, P0) m 7!(|0-7)! = (7x6x.x 2 xI)x( 3 x 2 xl) 

As the binomial probability P is found to be lower than 0.05, it may be inferred that the given number of 
goitre cases has occurred in the sample of 10 owing to reasons other than random sampling (P < 0.05). 

Absolute expected frequency (f t ) of the similar combination of goitre and nongoitre cases in k number of 
such samples (k - 250) is given by: 

/ # (7) = kPp) = 250 x 0.042 = 10.5. 


6.10 POISSON DISTRIBUTION 

In a population dichotomized with respect to 
a variable, one of the two classes may sometimes 
consist of rare events or cases forming a very low 
proportion of the population. The distribution of 
relative expected frequencies or probabilities of 
different numbers of such rare events in a sample 
from such a population may conform to a 
theoretical probability distribution, called the 
Poisson distribution after its formulator S. D. 
Poisson, a French mathematician. In such cases, 
the probabilities or relative expected frequencies 

ofO, 1,2,3,. n numbers of rare events, among 

the total number n of all the events, are given by 
the successive terms of the following Poisson 
distribution : 

J_ . M P 2 

*' He"’’ 21e" ; 3!e" ; . «!*" ’ 

where p is the mean of the Poisson distribution, n 
is the sample size, and e is the base of the natural 
logarithm, approximating 2.7183. Using the 
sample mean X , these probabilities may be stated 
,n terms of the Poisson distribution in the 


following series: 

I X . X 2 X 3 X" 

7 * 1!*^ * 2!e* 3!e* n\e* 

Thus, the probability P(X) of X number of rare 
events, occurring in a sample, is given by : 


_X^_ 

’ [X(X-1)(X — 2). x2xl] (2.7183)* 

The absolute expected frequency (f e ) of a given 
number (X) of rare events in the total number ( k ) 
of samples, each of size n, is obtained as follows : 

f e (X) = kP(X). 

Properties 

1. The Poisson distribution is the probability 
distribution of rare events belonging to one of the 
two classes of a dichotomized variable. The class, 
whose events have a distribution in conformity to 
the Poisson distribution, has a very low or near- 
zero proportion ( p) in the population ; the events 
of the other class occur far more frequendy and 
have a proportion ( q ) close to 1 in the population. 



14 


Scanned by CamScanner 

















statistics in biology AND PSYCHOLOGY 




106 

!.—"Tt "as <cm “ ■« 

because it is a probability d.strtbuuon of whole 

numbers (0,1,2,3 . n) of events, sedated by CD-SL.,. 

occurrence^of frachonal numbers of rare events. ? Poisson distributions have positive 

3. It can be formed theoretically by working skewness, given by the moment coefficient (yj 
out probabilities of random occurrences of the 0 f skewness : 
events of the rare class using the Poisson equation 
(see above). 


y. = 


4. The rare events occur at random and 
independent of each other in the sample - the 
probability of occurrence of one rare event is not 
increased or decreased by the occurrence of any 
other. 

5. The rare events may occur either spatially 
in a specified space or volume (e.g., Down 
syndrome patients in a sample of children, or 
abnormal erythrocytes in a hemocytomctcr 
chamber) or temporally in a given time interval 
(e.g., number of mutations per day, or number of 
suicide cases in a month). 

6. Mean (//) and variance (o 2 ) of a Poisson 
distribution of rare events are identical, finite and 
very low — less than 5 — compared to the total n 
number of events of both the classes ; thus, the 



Fig. 6.10. Frequency polygons of Poisson distribution. 


For p as low as 0.1. the distribution is reverse 
J-shaped with no left tail and a prolonged right 
tail. As p rises, a peak appears, but the right or 
high-value tail is longer than the left one - this 
positive skew'ness declines progressively with the 
rise in p. 

8. Poisson distribution is leptokurtic, its 
leptokurtosisdeclining with the rise in//. In terms 
of the moment coefficient (yf) kurtosis, 

_ J_ _ J_ 

fj ~ np' 

Because the shape of Poisson distribution 
depends on the mean, probabilities of rare events 
also depend on the latter. 

Assumptions 

For applying the Poisson distribution to the 
observed results, it should be justifiable to assume 
that: 

(a) the variable under investigation is divided 
into only two classes with intervening gaps 
between the two ; i 

(Jj) the proportion p of the population, falling 
in the rare class whose events are under 
consideration, is very low or near-zero while the 

proportion q in the other class is correspondingly 
high; 

(c) p and <j~ of the probability distribution of 
rare cases are identical, finite, both equal to np 
and lower than 5, so that the CD is 1 or nearly 1; 

(d) the rare events occur at random and 
independent of each other in the sample. 


Scanned by CamScanner 













PROBABILITY DISTRIBUTIONS 


107 


Table 6.8. Comparison between binomial and Poisson distributions. 


Binomial distributions 

1 Probability distributions of numbers of cases of 
either of the two classes of a dichotomized variable 
in samples. 

2. Neither p nor q, the respective proportions of the 
two classes in the population, is much lower or 
higher than 0.50 so that neither can be considered a 
rare class. 

3. Discrete probability distributions of only whole 
numbers of cases of either class, constituting a 
discontinuous series without any fractional number 
in between. 

4. Theoretical model of distribution can be worked out 
using the binomial equation. 

5. Distributions of cases of a class are bilaterally 
symmetrical with no skewness so long as its 
proportion p equals 0.50 in the population, but are 
negatively skewed when p exceeds 0.50. and 
positively skewed when p is less than 0.50 — the 
moment coefficient of skewness is given by 

(q-p)/y[npq- 


Poisson distributions 

1. Probability distributions of numbers of cases of one 
specific class of a dichotomized variable in samples. 

2. Probability distributions of cases of only the rare 
class of dichotomized variables, which has a very 
low, even near-zero proportion p in the population. 

3. Discrete probability distributions of only whole 
numbers of cases of the rare class, constituting a 
discontinuous series with no fractional number. 

4. Theoretical model of distribution can be worked out 
using the Poisson equation. 

5. Distribution of cases of the rare class arc positively 
sknved , w ith the positive skewness declining with 
the rise in the mean (jt) — the moment coefficient 

of skewness is given by : l/ yfjil. 


6. Distributions of cases of the p class are plot kurtu 
so long as p is between about 0.2114 and 0.7886. 
but leptokurtic when p is beyond that range — the 
moment coefficient of kurtosis is given by : 
(l-6pq)fnpq. 


6 Distributions of cases of the rare class are 
leptokurtic, the leptokurtosis declining with the rise 
in V — the moment coefficient of kurtosis is given 
by: \/p. 


7. The mean (p) and the variance (ct~) of the 
distribution of cases of the p class are given by np 
and npq respectively, where p and q are the 
respective proportions of the two classes in the 
population. 

8. The coefficient of dispersion (CD) of the distribution 
of the class with a proportion p in the population 
equals the proportion q of the other class and is 
consequently lower than 1.00. 


7. Both p and a 2 of the distribution of rare cases with 
a proportion p in the population are given by np 
and amount to less than 5. 


8. Being the ratio of p and o 2 , the CD equals 1.00 for 
the distribution of the rare class. 


Example 6.10.1. 

Work out and interpret the probability of random occurrence of 5 Down syndrome cases in a sample of 200 
en fiom a population having 0.5% incidence of that genetic disorder. Also compute the absolute expected 
“equency of such Down syndrome cases in 1000 such samples. 


Scanned by CamScanner 











108 


STATISTICS IN 


BIOLOGY and PSYCHOLOGY 


SciuzoK : 

Wter p is the proportion erf Down syndrome children in the population. 


,= M =0 .0Q5; n = 200; *= 1000: X = np = 200 x 0.005 = LOO; 


X x 


f 


PVO- - ; oc, P0>) 5!a7183) i 


= 0.003. 



As die Poisson probability P is found to be lower than 0.05, it may be inferred that the given number of 5 
Down syndrome cases occ ur red in the sample of 200 owing to reasons other than mere chances of random 
5amp{ii!2 iP < 0.05). 

Absolute expe ct ed frequency (£) of such ca ses in 1000 such samples (k — 1000) is given by : 

f t (X) = kPUO = 1000 x 0.003 = 3. 


GLOSSARY 

asypta flcti B: soch a tail of a dist ri bution «s a pproaches ever nearer to the abscissa < ordinate but never meets 
it within a finite distance. 

®* r ** B ® : a distribution formulated by Jacques Bernoulli to help gi\ e the binomial 

probability of random occurrence of events or cues of either of the classes of a dichotomous variable in 
a sample. 

binomial distribution : discrete probability distributions of cases or events belonging to either of the classes of 
a dichotomous variable, worked out theoretically from the binomial equation. 

ceatral Emu theorem: means of large samples have an almost normal sampling distribution around the mean of 

their population even if the latter has a noo-oorma] distribution of the relevant scores, provided the 
pojxiianon has a finite variance. 


° f P robabmt > : 1 to m ilmoa normal distribulion if ils scores depend on the 

■nrfcpeodeoj effects of man> olher variables. acting al random and with no interaction with each other. 

c ” £ram samp,e " *•“«— 

* *“* ^ ^ - working on, a 

tartosis : the magnimde of peakedness of a probabiliry (or frequercy, distribution. 

l e P*°l ai rtnsis : the peakedness of a distribution having a hieher or sham* e 

tails than the mesokuitic normal distribution. * ^ sharper peak, a narrower body and thicker 


of 


nrtesis : the medium degree of peakednes- -if th^ 

P ca * ce dness of (fistributions. uormal distribution, generally used as a standard 


normal distribution: continuous probability for - 

ming the Gaussian elation and are uamvviai 1 °!^ * di f tributions which can be theoretically worked out 
aormal distribution, best-fitting ■ a ' TV* * 

■ie tame mean. SO and sample size as & be ** w “ h “ observed dittribuuon. and to 


Scanned by CamScanner 
























109 


PROBABIl ITY DISTRIBUTIONS 

pltfykurtosto: the peakedness of a distribution having a flatter peak, a broader body and thinner tails than the 
mesokuttlc normal distribution. 

7^ W,i 'y «“*■*» of «*s or events, belonging to the rare class of a 
‘ u o ‘otl out theoretically front the equation fonttulated by S.D. Poisson. 

'£ £££" ° r * 0res 0f * siven ,ype among » very large “ almost 
of — ot “—■—-—<* 

::‘!r:: nu,,uo “ s ■ probQbnity dis * rib “‘ i — ^» co ntinuous n « asureme „ t varial ,,e. 

probability distribution, discrete • nmh n kii;»%. r > ^ . 

probability distribution, theoretical:. probabili," d^Z T “**” * C °‘“ ° f “ discon,inuous 

ihettrttleally on the basis of a specific mJthemMkal '^atlonor' model' SC °*“ * * V!>ri ” ble ' w0lked oul 

Newness, negative : bilateral asvmmrm, ,r 

ShetrnerI!:: 8 ",'"" *^«*S25^ ^ ^ Us negative or low- 


* ls Positive or high- 


«- new, lugn-value tail. - uu 1 

, ZSSrH”""”'■s"i 

“ nu “onoal curve • ,1 " ‘ ^ * Si8nifira "ce."'‘* C " iC fraC " onal in the ,ail( s) of a 

f s «reagaLsttb 'l 0 "" 01 Polity distrihou " 0rmal ourve as 

*' WUito ' K I " hl ' re 1,16 «-Ple Plotting the probabi 

nd the y Kr) of each 




S12e (0 and the SD of raj scores 


art 


Scanned by CamScanner 






7 testing OF 




^ ot 

.;; '‘"l eM'onmcn. usi«S * 

IMiwwO of ll» «»“*' ‘ ' *e probability 

Nttmi'lc In «wi«iw« i '■>' * h ,. i ow ing to 

( /«, Ot M«h .1 twttH nilHltt* by ^“Zjom 
tlif ilmwlug or U* pnntcnlnr ^iu'l K ^ 11 
mint|>Ung. l<of o»ttm«tlng thin a 

rniuloin ocuurrenou ol tlvc obw'* ' . 

Mtuuhml score often t or t - ,s ^ n f f 
from the observed data and die pro m it J . 

Its random occurrence is then worked out using 
respectively the normal and t distributions. 

7.1 DIFFERENCE BETWEEN MEANS 
In an experiment to study tbc cllcct ot an 
independent variable (page 5) on a particular 
dependent variable (page 4), two or more 
groups may be randomly sampled from a 
population and exposed to different levels 
(doses, amplitudes, intensities, etc.) of the 
Independent variable. Tbc dependent variable ri 
then studied or measured in each group to tind 
whether or not the groups differ from each 
other with respect to tbc mean of the 
dependent variable scores, There may, however, 
be two alternative reasons for any observed 
difference between the group means. First . the 
independent variable may not have produced 
any change in the dependent variable ; even 
then, the mean dependent variable scores of 
two groups, exposed to different levels of the 
independent variable, may differ from each 
other due simply to different sampling errors 
(page 67) of tbc group means — two group 
means in such a case do not differ significantly, 
arc different estimates of the same population 
mean for tbc dependent variable, and their 
difference is so small as can be accounted for 
by the SE of difference between the means 
(pages /2-73) because these groups still belong 
10 Ihc »ninc population. Alternatively, ,he 


H YPOTHESIS 

,, nt variable may indeed have prod*** 
ftdepende dependent variable , c 

Unrices * n _ ____ 


means of th< 


inc 

change*- ’ een group 

diffe, T^ e variable, in such cases, is «cc. !** 

dependent vnn fey ^ SE of 

*° ** ^ me ans or by the different samefej 
t-eoveen th meaIls _ the group means 

en0tS t . ’ be considered to differ signincanfly 
"'“i o be the estimates'represenutives of the 

und t0 ean s of two different populations. 
SrXchfhe two groups are considered » 
belonging. 

In view of these two alternative possibilities, 
significance of the difference between ^ 
croup means must be assessed in term, ot the 
SE of that difference, viz., whether (»V 
observed difference is not significant , being 
small enough to be explainable by the SE 
the difference as being due simply to different 
sampling errors of those means, or (if) the 
difference is significant, being too large to be 
explained aw’ay by the SE of the difference as 
arising merely from the sampling errors. For 
such assessment, the observed difference 
between two group means is generally 
transformed into a standard score (either z 
or r), using the SE of that difference (pages 74- 
75); the probability (P) of the computed 
standard score occurring by mere chance Aie to 
random sampling is then found out using either 
the unit normal curve or the t distribution, as 
the case may be (vide § 7.6 and 7.7). So long 
as this computed probability (P) does not 
exceed a particular chosen level of probability, 
'iz., the level of significance (ct, vide § 73), P 
is considered too low (P ^ ct) — it is then 
inferred that the probability P of the observed 
difference having arisen by mere chance ct 
random sampling is too low and so» 
observed difference between the means .^> he 


110 



Scanned by CamScanner 




TESTING OF HYPOTHESIS 


At>red significant. But if the computed P 
C ° nS 'ds the chosen significance level (P > a). 
"Tcon^red .00 high - it is .hen inferred 
Zt probability P of the random .occurrence 
r he Observed difference .s too high and so. 
t*L, means do no, differ sigmficarnly. m 
words, .he independent var.able has not 
produced significant changes in the dependent 
variable. 

7 2 NULL HYPOTHESIS 
' Experiments are generally performed with 
random samples instead of the enure population 
/ oa ges 7-9); the inferences drawn from the 
results observed in samples are then sought to 
be generalized over the ( opulauon. Thus, 

there is always a probability that the °b<vervcd 
results may have D hi " " -cdcnUl 
choice of the particular sample drawn by 
random sampling in accordance with the law* 
of probability, and would not have been 
obtained if the entire populal ere subjected 
to the experiment instead of a random aa mp lc. 
So. m drawing an inference, the investigator 
has to weigh this probability (/*) of the 
observed result'' ha\ mg arisen from the chances 
associated with random sampling. In other 
words, in interpreting the result ol an> 
experiment using samples, the investigator has 
to assess the probability (P) of the correctness 
of a null hypothesis ( H Q ) which proposes to 
nullify or negate the hypothesis under 
investigation in the experiment, by professing 
that the observed result (i) has arisen by chance 
due to the drawing of the particular sample by 
random sampling, ( 11 ) would not have been 
obtained if the entire population would have 
been used in the experiment instead of the 
sample, and (if/) has, therefore, no significance. 
On the contrary, the hypothesis investigated in 
the experiment and contested by the null 
hypothesis is called the alternative hypothesis 
W fl ). For drawing the inference, the result of 
an experiment has to be subjected to the 


statistical testing of the contesting h>pothcscs. 
H 0 and H a . relevant to that expenment. 

The H 0 takes different forms according to 

the H that it contests. However, what is 

common to all forms of H 0 is the proposition 

that the observed results have been produced 

solely due to the accidental choice 

particular sample by the operation of 

probabilities inherent in random satin ling. . 

*. ._ t h nnd H. are given 


below. 

(a) In testing the significance of a difference 
between the means of two groups (samples), 
the H n proposes that the observed difference is 
not significant, that it has resulted from mere 
chances of random sampling, and that the 
difference would have been zero it the entire 
populauon were used in the experiment instead 
of groups sampled at random. In other words, 
ihc H a contends that both the groups (samples) 
belong to the same population, their means 
*crve •» estimates of the same parametric 
mean, and the observed difference between the 
mu* is due simply to ih.-u diiinrnt sampling 
errors. In contrast, cl. • If, being tCftcd in the 
experiment proposes that the two groups 
(samples) belong to two different populations, 
that their means are estimates of two different 
parametric means (/i, and /i 2 )» and that there 
is a significant difference between the group 
means, which cannot be explained away by the 
SE of such differences. Thus, 


H a : P i* P 2 * Hq : P\ ~ Pv 
(b) For the significance of a correlation 
between two variables, the proposes that 
there is no significant correlation between the 
variables, that the observed correlation has 
resulted from the accidental sampling of a 
particular group depending on laws of 
probability, and that the correlation coefficient 
would have amounted to zero if the entire 
population, instead of a random sample, were 
studied in the experiment. On the contrary, the 


Scanned by CamScanner 


112 


rY ^ND PSYCHOLOGY 

STATISTICS in BIOLOGY 

. m . specific alternative hypothe, 

; n w q r e ,'h Utter is no, directly ,„, e J 
W’. lit - on the contrary, its acceptance „ 
statistically , . d by the rejection 


, p is 3 

H. proposes that there _ j n . conueU T, —- r 

correlation between lhe .^“ an , difference statistically ; ° etermined by the rejection „ 
other words, there ; coe fficient rejection i correspon ding null hypothec 

sssr -.— L 


(c) In testing the goodness of fit wor, ^" S difference, correlation or associate 

frequency distribution observed m he observed diBere random s 

experiment and a proposed distnbuuon based by mere chance o^ 8^.^ p Qf ^ l 

Za “ -e observed £ corr ect is worked o«< statisfically „ 



istribulion ctTuTevel of probabihty (/eve/ of significant, 

:SZiTl two has respited from the „ probability of co^ectness of Ute 

,se of a particular sample or group crawn by js cons jdered too low (P £ «), the n Q may 
hance by random sampling, and that the two then be re j ec tcd. the H a may tnstead be 
listributions would have a significant goodness acceptc d, and the observed difference, 
yfftt if the entire population were studied corre i at i on or association is considered 
istead of a random sample. In contrast, the H a signi fi cant (vide § 7.3). But if the estimated P 
roposes that there is a significant difference cxcce( j s the chosen «. the probability of 


di a random sample, m wiiu«jp .. a significant (vtae 9 r.j;. dui u me estimated P 
roposes that there is a significant difference cxcce( j s the chosen «. the probability ol 
etween the two distributions, which cannot be correctness or propreity of the H Q is considered 
xplained away by mere chances associated |0Q (/> > or) ; in that case, the H Q is 

nth random sampling. retained, the H a cannot be accepted, ‘ L 

nh^rrvrH rr^nltv: nr** nnt sionifirnnt 


retained, me n a lamun uc 
It should be borne in mind that even in case observed results are not significant. 
the same experiment, there would be 
Serent null hypotheses for one-tail and two- 
1 tests, respectively (vide § 7.5). 

Although an experiment is performed 


and the 


7 3 


to 


LEVELS OF SIGNIFICANCE 

A level of significance ( a ) is u ,m 
probability of the random (chance) occurrence 


that 



Scanned by CamScanner 















TESTING OF HYPOTHESIS 


113 


I results, upto and below which the 
of observe of correc tness of the null 
probability jdcred too hw . Thus, so long 
h ypothcs )S p docs not excee d the chosen 
the cst ^ .g rejected because of the 
a ( p '.^’ ilitv (/») of its correctness, the H a is 

|oW i and the results of the experiment are 
accepted, ficant Bul if the estimated P 

COn& ' d A nc (P > the pr° babi,it y of 

CXCCC rorrect is considered too high so that the 
rejected ; so the is retained, the 
"» ca „not be accepted, and the observed 
Jesuits are consequently adjudged as nor 
significant- 

p4 a". H 0 rejected ; H a accepted ; 
results significant. 


p > a : //q retained ; Hg rejected , 
results not significant. 

For example, the investigator may use an a 
of 0.05 for the interpretation of the observed 
results of an experiment. In such a case, the H 0 
is rejected and the observed results arc 
considered significant if the probability P o< 
getting the results by mere chance, due to 
random sampling, works out to be 0.05 or less 
(P ^ 0.05) — this means that the results would 
be adjudged significant if out of 100 such 
trials, only 5 or less number of times the 
observed results may arise merely from the 
accidental choice of the sample (group) by 
random sampling. But if P is found here to 
exceed the chosen a of 0.05 (P > 0.05), the H 0 
would be retained because of the too-high 
Probability of its correctness, and the observed 
results would be adjudged not significant. For 
biological experiments, or is generally fixed 
c ither at the 0.05 level or at a still lower level 
s “ch as 0.02, 0.01 and 0.001. 

The level of significance (or) is given by the 
f .. area(s) in the tail(s) of the normal or 
^tnbution beyond the relevant critical z or ; 
/ 82-83 and 91). It may be recalled 

*£e* 82-83) that the sum of the fractional 


15 


areas beyond a two-tail critical <. a or t a in ^ 
tails of the relevant probability dismbu 
equals the corresponding ovo-tail signifit 
level (2-tail a) and constitutes the ^o-tad 

critical ( rejection) region of the dlstnbu ‘\ 
(Fig. 7.1 and 7.4a); thus, the fractional are 

beyond the two-tail critical z a or t a m cac 
of the distribution corresponds to a — 
as the computed z or t score in a tno tai es 
(vide § 7.5) is not lower than the two-ta, 
critical or t a , the fractional area beyond that 
computed z or t score does not exceed t t 
critical region ; because the total critical region 
in both tails corresponds to a and the tota 
fractional area in both tails beyond the 
computed 2 or t corresponds to the probability 
(p) of the correctness of H 0 . P does not exceed 


z or t either equals or exceeds the critical z a or 
r . and the H 0 may be rejected. The remaining 
area in the central part of the normal or t 


distribution, extending between the critical 
scores in the two tails of the latter, constitutes 
the acceptance region (1 - a) of the distri¬ 
bution (Fig. 7.1). If the computed z or t score 
is lower than the critical z a or t a , that 
computed z or / falls in the acceptance region ; 
the sum of the fractional areas beyond the 
computed z or t in both tails, corresponding to 
P, now exceeds the sum (a) of the critical 
regions in the two-tails and encroaches into the 
acceptance region so that P > a, warranting 
the acceptance of the H Q . 


It may also be recalled (page 82) that the 
fractional area beyond the critical z a or t a in a 
single tail of the normal or t distribution equals 
the corresponding one-tail significance level 
(1-tail or) and constitutes the entire one-tail 
rejection region of the distribution (Fig. 7.4b). 
In a one-tail test (vide § 7.5), the H () is 
rejected if the computed z or t score either 
equals or exceeds the one-tail critical z a or t a 
and consequently, falls in the one-tail rejection 
region (P ^ a). On the contrary, the H 0 is 




Scanned by CamScanner 



114 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 




accepted in a one-tail test if the computed z or 
t is lower than the critical z a or t a and 
consequently, falls in the acceptance region 
(I - a) of the distribution; here, the acceptance 
region (1 - a) of the distribution ranges from 
the critical score in one of the tails and over 
the total remaining area of the distribution 
including its entire other tail (Fig- 7.4b). 

7.4 ERRORS OF INFERENCE 

Because the inference of an experiment is 
drawn through either the rejection or the 
acceptance of the Hq according to the 
probability (P) of its correctness, the inference 
always suffers from probabilities of errors oj 
inference owing to either (/) a wrongful 
rejection or (it) a wrongful acceptance of the 
H 0 depending on the estimated P and the 
chosen or. 

Type I error of inference 

This consists of the wrongful rejection of a 
true H 0 . In other words, it is the error made in 
accepting the accidental and insignificant 
results of an experiment as significant. As the 
H 0 is rejected if the probability ( P ) of its 
correctness does not exceed the significance 
level (or), the probability of the type I error is 
limited to the a used in making the inference. 
Stated otherwise, a is that maximum 
probability of type I error, which the 
investigator risks in rejecting the H Q for making 


the inference. Thus, the higher the chosen 0 
for considering the H Q untenable, the greater jj 
the probability of the type I error. If the l eVc | 
of a used in making the inference amounts to 
0 . 01 , the type I error has a probability of 
occurring once in 100 such trials due (q 
random sampling; if the cc used is 0.05, 
probability of type I error increases to five 0ui 
of 100 trials. 

In making any inference, the status of th e 
probability (P) of the correctness of the // 
should always be mentioned in relation to the 
significance level (or) used ; e.g., P < 0.0^ 
p = 0 . 02 , P < 0.05. P > 0.05, P < 0.001, etc. 
This immediately gives out the probability 0 f 
the type I error in the inference made. 

Type II error of inference 

This consists of a wrongful acceptance of a 
false H 0 . It is thus the error committed in 
rejecting the really significant results of an 
experiment as insignificant. Stated otherwise, 
the type II error (p) is the failure in identifying 
the genuine results of an experiment and 
consequently rejecting a true H a . Important 
factors in the type II error arc as follows. 

(a) Type II error (ff) depends on the overlap 
between the distribution proposed by the H a 
around its parametric mean (ft a ) and the // 0 
distribution around the parametric mean (/!q) of 
the latter (Fig. 7.2). Indeed, p is that fractional 
area of the H a distribution which falls within 



K . (l ~°> .•**-—- U~(l -a)--4» — -(l-ffi—— 

« (b) 

IX relations between type I and type n etTors. (a) Two-tail a : 0.01. ( b ) Two-tail a : 0.05 


J 


Scanned by CamScanner 










TESTING OF HYPOTHESIS 


115 




Fig 


7 3 Higher type II error (fi) when the parametric means of H 0 and H a distributions are closer (as in b) 
>• than when thev are wider apart (as in a). 


lhe acceptance region (1 - a) of the H„ 
distribution. 

(b) The lower the significance level (a) 
used in rejecting the H 0 , the narrower is the 
rejection region of the H Q distribution and the 
lower is the probability of the type I error ; but 
simultaneously, the wider is the acceptance 
region (1 - a), thus increasing the overlap 
between the H (l distribution and the latter, and 
enhancing the probability of the type 11 error or 
/} correspondingly (Fig. 7.2a). Reverse is the 
case when a is increased. Thus, there is an 
inverse relationship between the probabilities of 
type I and type II errors (Fig. 7.2). Generally, a 
smaller a like 0.01, 0.02 and 0.001 may be 
chosen inspite of a higher risk of ft, when it is 
intended to limit the risk of a wrong positive 
inference due to the acceptance of the H a . 

(c) The closer the parametric means (jd Q and 

A„) of the H q and H Q distributions, the greater 

is the overlap of the two distributions (Fig. 

7-3). This makes it more difficult to 

discriminate from and consequently 

enhances the probability of the type II error ($ 

°f a wrongful acceptance of a false H Q . On the 

contrary, fi declines if // and are wider 
apart. a 0 

7,5 0Ne -TAIL and two-tail tests 

frfi- 3 lW0 ' ta ^ or a one-ta il statistical test 
usetf according to the H Q in question. 


Two-tail test 

A two-tail test is a nondirectional statistical 
test for finding the significance of only the 
magnitude of the observed difference between 
the means of two groups (samples), irrespective 
of the algebraic sign of that difference. For 
example, if the mean serum cholesterol 
amounts to 220 mg dL -1 in a group of 
diabetics and 180 mg dL -1 in a group of 
nondiabetics, a two-tail t test may find whether 
or not there is a significant difference between 
the group means, without considering wheiher 
one of the group means is significantly higher 
(or lower) than the other. The H 0 proposes here 
that there is no significant difference between 
the group means, any observed difference 
having resulted from chances associated with 
random sampling. Since both positive and 
negative differences between the group means 
are under consideration here, the differences 
may lie in both tails of the distribution. Thus, 
the rejection region of the H Q distribution 
involves both tails, amounting to a/2 in each 
(Fig. 7.4a). The sum of the fractional areas in 
both tails of the H Q distribution thus gives the 
two-tail probability (P) of correctness of the 
H q . The inference of such a two-tail test is 
limited to the existence/absence of any 
significant difference between the group means 
and does not take into consideration whether or 
not one of the means is higher or lower than 
the other. 


Scanned by CamScanner 







pp 


TESTING OF HYPOTHESIS 


117 


variables in an experiment. 

Whether a two-tail test or a one-tail test is 
proposed, the observed result (e.g., a difference 
between two means, or a correlation coefficient 
between two variables) is first transformed into 
a standard score (e.g., z score, t score) by a 
process common to both types of tests (page 
75 89-90). The computed standard score may 
next be used in working out either the two-tail 
probability (2-tail P) for a two-tail test or the 
one-tail probability (1-tail P ) for a one-tail test 
(page 82). In a two-tail test, the observed result 
(e.g., the observed difference between two 
means) is considered significant, only if the 
computed P does not exceed the chosen two- 
tail significance level ( P ^ a). Similarly in a 
one-tail test, the observed result (e.g., a 

negative X, - X 2 difference) is considered 
significant, only if the computed P does not 
exceed the chosen one-tail significance level or 
1-tail a (P ^ a). 

For a two-tail test using the 0.05 level of 
significance, i.e., the 2-tail a of 0.05, each tail 
of the H 0 distribution ends with a rejection or 
critical region (all) of area 0.025, extending 
beyond the two-tail critical z score of 1.96 in 
that tail (Fig. 7.4a). If the z score computed 
from an observed (X, - X,) difference lies 
either at or beyond -1.96 or +1.96, it falls 


within the respective rejection regions and the 
H q is rejected — the observed difference is 
then considered significant. But in a one-tail 
test using the 0.05 level of significance, the 
1-tail a of 0.05 involves a single tail of the W 0 
distribution — the rejection or critical region is 
thus restricted to that tail only, extending over 
the entire a area of 0.05 which is the fractional 
area beyond the one-tail critical z score of 
1.645 in that tail (Fig. 7.4b). If the z score 

computed from the observed (X, - X,) 
difference amounts to 1.645 or more, it falls in 
the single-tail rejection region and the H 0 may 
consequently be rejected. It is thus evident that 
with an identical a, an observed difference may 
be significant in a one-tail test, but may fail to 
be significant in a two-tail test — a computed 
z score of 1.645 in the above example is 
significant for a one-tail test at the 0.05 level 
of significance, but is not significant for a two- 
tail test at the same 0.05 level. 

Some one-tail and two-tail critical scores for 
different significance levels are tabulated in 
Table 7.1. 

7.6 SIGNIFICANCE OF DIFFERENCE 
BETWEEN MEANS USING Z SCORES 

For an experiment to study the effects of an 
independent variable on a particular dependent 



Scanned by CamScanner 







118 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


variable in a given population (pages 4-5), 
often separate groups of individuals arc drawn 
from that population by random sampling, and 
each group is treated with or exposed to a 
specific /eve/, i.e., a particular amount, 
concentration, amplitude, intensity or qualitative 
type, of the independent variable. Frequently, 
one of the groups, called the control group, is 
given a level of treatment that is free from the 
independent variable while the others, called 
the experimental groups, receive different levels 
of the treatment containing specific doses or 
amounts of the independent variable. 
Subsequently, the dependent variable is 
estimated or studied in all the groups. Such an 
experiment, using groups consisting of separate 
sets of individuals or cases, is called an 
independent group experiment. 


For drawing inference about the significanc 
of the observed difference between the mean 
of two large independent groups, each not les 
than 30 in size (n ^ 30), the unit normal curv 
(Table A of Appendix ) is used in working oi 
the probability (P) of the correctness of the H t 
because the means of large groups, drawn fror 
a normally distributed population, as well a 
their differences have sampling idistribution 
conforming to the normal probabilit 
distribution (page 83). For this, the observe 
difference between the group means, sa 

- * 2 )> is transformed into the standard 
score and the probability (P) of occurrence o 
the computed z score by mere chance o 
random sampling is worked out using the uni 
nonnal curve yeas. The H 0 is rejected and th 
observed difference between the means 
x i~ x 2 > 1S considered significant, only if the j 
thus worked out does not exceed the chosei 
level of significance (P ^ a). 

For using any statical last, , he variab|i 
under .nvesiigaiion, U,e relevam populalion an, 
the obtained data must fulfil certain criteria. I 
is, however, not always necessary to work ou 


whether the required criteria have actually b Ce 
fulfilled ; it will suffice if it is justifiable l 
assume that the required criteria have beep, 
fulfilled. Such criteria are thus called th<! 
assumptions for the test. 

Assumptions for tests using z scores 

For finding the significance of the observed 
result using z scores, it should be justifiable to 
assume that : 

(a) the dependent variable is a continuous 
measurement variable ; 


(b) its scores have a nonnal distribution in 
the population ; 

(c) each score occurs in a group (sample) at 
random and independent of all other scores — 
this ensures the representative nature of the 
groups for the population so that the results 
obtained with the groups can be generalized 
over the population; 


(d) the groups (samples) are large enough 
(n ^ 30) so that their means and consequently, 
the differences between the means have normal 
sampling distributions ; 


(e) the groups initially possess homogeneous 
variances, i.e., their variances are initially 
different estimates of the same population 
variance ( homoscedasticity ), differing only due 
to the sampling errors. 


It follows from these assumptions that tests 
using the unit normal curve and z scores 
cannot be used if (j) the dependent variable is 
^continuous, ordinal or nominal variable, 
it) its scores have a non-normal or skewed 
istribution in the population, and (Hi) if the 
groups are of small sizes (n < 30). 


The null hypothesis for a iwo-tail test 
P poses (hat there j s no significant different* 
toween the group meons , tha , ^ " ma # 

^ CSUmates ^ *une population mean, to* 


Scanned by CamScanner 





TESTING OF HYPOTHESIS 


119 


rved difference between the means has 

the Ob* 

from mere chances of random 
fCSU \ C i and would not be there if the entire 
SafI1P |ation were used instead of the randomly 
P^plcd groups. Thus, if //, and /z 2 are the 
Sa ulation means estimated respectively by Xj 
lid X 2 of the two groups. 


// 0 : p i " ^2 = 0 ; : Pi. * 0 

The probability P of the correctness of this 
// is worked out and interpreted as follows. 


(a) The observed difference (X, - X 2 ) 
between the two group means is converted to z 
score, starting with the assumptions that the H ( 
is correct, and //, and fi 2 are identical. Where 
s and 5 , arc the SDs of the respective groups, 
and Sf are the SEs tit the respective group 
means, is the 5E of the difference 

between means, (//, //.-) is zero, and n, and n 2 
are the respective group sixes. 



z = (Xj-X>)-(//,-/A) _ X,-X; . 



(b) the fractional area of the unit normal 
curve, extending from its mean (//) to the 
computed z score, is taken from the Area 
column of the unit normal curve table (Table A 
Appendix) and used in working out the two- 
lail probability (2-tail P) of the H Q being 
correct (page 82). 

P - 2 [0.5000 — (fractional area from p to the 
computed z)]. 



(c) The P thus worked out is next compared 
with the chosen level of significance (a). If P 
is lower than the chosen cr, the probability of 
chance occurrence of the observed difference, 
as proposed by the H 0 , is considered too low 
(P < a) ; if P equals the chosen a, then also 
the probability of chance occurrence of 
(X, - X ; ) is considered too low (P = a) ; in 
both these cases, the H Q is rejected, the H a is 
accepted and the observed difference (A', - X 2 ) 
is considered significant. But if P exceeds the 
chosen a. the probability of the f/ 0 being 
correct is considered too high (P > a) ; so, // 0 
is retained in this case and the observed . 
difference (A', - X,) is considered not 
significant. 

One-tail test 

The null hypothesis for n one-tail test 
proposes that one group mean is not 
iigmficantly higher (alternatively, lower) than 
another, that the observed result, showing one 
mean higher (or lower) than the other, have 
amen by chance due to random sampling, and 
that no such result would have arisen if the 
entire population were studied instead of the 
randomly sampled groups. Thus, 

cither. H 0 : //, > ; H a : > // 2 ; 

or. H 0 : < /h • H a : Mi < fir 

The probability P of the correctness of the 
H 0 is worked out and interpreted as follows. 

(a) The observed difference (X, - X 2 ) 
between the two group means is converted to z 
score in the same way as in the case of the 
two-tail test, starting with the assumption that 
the H 0 is correct. 

X\ — X-, X| — X, Xi — Xi 

z = —-— = i — ■ = —t " • 

hxf1 + d 

V"! "2 

( b ) From the Area column of the unit 
normal curve (Table A of Appendix, the 



Scanned by CamScanner 















120 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 



fractional area of the unit normal curve from 
its //to the computed z is taken and used in 
working out the one-tail probability (1-tail P) 
of correctness of the H Q (pages 81-82). 

P = 0.5000 - (fractional area from // to the 
computed z). 

(c) If the P thus worked out is found to be 
either lower than or equal to the chosen 


significance level (P ^ a), it is considered t 
low, the H 0 is rejected and one group mean 00 
considered significantly higher (alternative}* 
lower ) than the other. But if P exceeds \\} 
chosen a (P > a), it is considered too high th ? 
H 0 is retained, and one group mean "is „ e 
significantly higher (alternatively, lower ) th* * 
the other. '" 1 


Example 7.6.1. 

The mean time for regeneration of amputated basal discs amounted to 18.6 hrs (SD 2 20 hr^ i„si . 
(nonmutated) animals of Hydra vulgaris, and 17 7 hrs (SD 2 70 hrO in q*; c .• ' 4 ' v, ld 

<«> ^nd whether or no, there is asignificantdiL^h^ “ ° f ,hat Wes. 

groups (a - 0 05 * (h\ ic rh#» regeneration times of the twn 

Lun,s( a = oo,) 1 ume significantly higher iu >he wild hydra than 


^ Solution 


f 


Wild animals : X, = ,g. t hn ; . 2 . 20 hrs ; = 84. 

. MU “” U : X ■ = 177 •« : a 2 ■ 2 70 hrs ; n, « 86. 

’ °' ,ml ' eS ' f0r "* Sconce of the differ Ween mean, : 


H o P ro POses that the two means A', and Y a rvM a c * . 
only to chances of random sampling. ' a il8m Ican, *y- observed difference being due 


<o) Assuming the «„ to be corner, the difference (X Y t k.. 

Ihe * scorc ' “ *7> betw «n the means is first transformed 


into 


s \ 2-20 

~T~ = ~l — = 0.24 • r_-*2 2.70 

™ ^ % ~ ^ = Tie = 0 29 ■ 

- Y '- = = "'/(>.24 2 + 0.29 2 = 0.376 hr. 


= ^i~Y 2 _ 18.6-17.7 

r- -- ~ e-v__ — 


%~x 2 

Xi-Z 


0.376 


= 2.39. 


Alternatively, z = ~y ~ A-> _ 

JCl 

I n 2 


18-6-17.7 

I(l20f (2.70) 2 
84 T 86 


= 2.39. 


/LA rr- OO 

The two-tail probabilitv P nf ibo « 

then worked out using the unit normal ° mpUted z score occurring by mere cha 

, - J ,0 5000 * . C “” e table (Mfc A of Append SampU " 8 ' “ 

= 2 S I ass S 7 * - — - - * , .0 competed t 0 f 2 39)l 

a = 0.05 


Scanned by CamScanner 








TESTING OF HYPOTHESIS 


121 


..-tv p of 0.017 for obtaining the observed difference (X, - X 2 ) or its z score by mere 
So, ,he P roba i Opting is less than the chosen a of 0.05 and is thus considered too low. Hence, the H 0 
han ce of ran ^° t n h e difference between the means is considered significant (P < 0.05). 

C . rejected and _ _ 

•/ for the significance of X, being higher than X 2 : 

2 Ofl e ’ tan 1 J _ _ _ 

ds that X| is not significantly higher than X 2 . The difference (X, - X 2 ) is converted to t e . 

Ho conte \L wa v as for the two-tail test, given above. Thus, z = 2.39. 

, in the same w aj 


score 


, H out f or a one-tail test, using the unit normal curve table (Table A of Appendix), 
p is worKco u 

0 5000 - (area of the unit normal curve from its p to the computed z of 2.39)] 


= 0.5000 - 0.4916 = 0.0084. 


a s 0 . 01 . 

ted P of 0.0084 is considered too low as it is lower than the chosen a of 0.01. The H 0 is, 
T J ie C ° re j ec ted and the mean regeneration time is considered significantly higher in the wild Hydra than 

therefore, rej Al . 

in the mutants (P < 0.01). 


Example 7.6.2. 

In a modified form of Differential Aptitude Test, the mean score obtained by a sample of 374 girls 
amounted to 98.7 (SD 14.08) while the mean score of another sample of 255 boys was 95.5 (SD 13.02). Is 
there any significant difference between the mean scores of the two sexes (a = 0.01) ? 


Solution : 

A two-tail test is undertaken to find the probability P of correctness of the H Q which contends that there 
is no significant difference between the two sample (group) means, the observed difference being due only 
to chances associated with random sampling. 

For girls : X ( = 98.7; s x = 14.08; n, = 374. 

For boys : X 2 = 95.5; s 2 = 13.02 ; n 2 = 255. 

(a) The difference between the two means is converted to the z score. 


14.08 


13.02 


■j = = Y U ° = 0.728 ; sj = —t— = / - 0.815. 

*' 77374 - V"! V255 


s *i 


-*3 = + = 7(0.728)’- + (0.8l5) 3 = 1.093. 


or > • y 7,-x 2 


-M-i 


(14.08) 2 | (13.02)- = j Q93 


374 


255 


z - 


_ X|-X 2 _ 98.7-95.5 = 2g3 


s x { -x 2 


1.093 


16 


Scanned by CamScanner 













. W"'"" 

122 , , is US .J to find the iwo-uil probability /> „ 

n , Mc A of Appends) 'S use s by chance due h 

(») The uni. normal curve uble <£»' ob5er ved d.ffenrnce bemeen 
obtaining the computed z score, and hence 

random sampling. , ;♦« n to the computed z o 

P = 2 [0.5000 - (area of tbe no ^ 

= 2 [0.5000 - 0.4983] - 0.0034. of 0 0 1. the probability of the k 

(c) Because the computed P of MOM is ^ ^ ^ ^ ^ « consider 

being correct is considered too low. The H * * 
significantly different in the two sexes (P < 0-01). 


The mean systolic blood pressure amoiiHted to 1 2 ^ n f1 ^ ” ^ I ’l^crc'a ^ ci i. t - can 1^,1 .ft., cn’c 

- -.....- 

significantly higher in renal ischemia patients ? (ff - 0.0 IX 


Solution : 

i For ischemia patients : X, = 139.5 nun Hg: s, « 30.05 mm Hg : n, = 40. 

' For normal men : X 2 = 120.7 mm Hg ; i, ■ 25.90 mm Hg ; n 2 - 40. 

1 . Two-tail test for significance of difference between me. 2 m 

The //„ contends that there is no significant Jut;--.- • - . X *<J V> and the observed difference 
has resulted from the chances associated unh randt m \arepi.ng 

(ff) The difference (A'j iween the means is nuformed mto z sc< 




- / (30.05r (2590^ 

•h n : V 40 ’ 40 


= 6.273 mm Hg. 


z = V »~ A - 

fvi-y. 


139.5-120.7 

6.273 


= 3.00. 


(b) The unit normal curve table (Table A of Appendix) is used to find 
obtaining the computed z score by chance due to random sampling. 


the two-tail probability P of 


P - 2 [0.5000 - (area of unit normal curve from its // to the computed z of 3 00»| 
= 2 [0.5000 - 0.4987] = 0.0026. 


a = 0 . 01 . 


(c) Because the computed P of 0.0026 is lower than the chosen „ n m ■ 

being correct is considered too low. The H n is, therefore rei«neH a u °' 01, the Probability of the 

differ significantly between normal and ischemic men (P < 0 01 ^ ^ 01630 SySt0,ic BP is considered 


2. One tail-test for the significance of X l being higher than % : 

The H 0 proposes in this case that X, is not significantly higher *an %. 


Scanned by CamScanner 








TBSTINO Ol' HYPOTHESIS 


123 


(a) The difference (X| - X^) in converted to z score in the same way as for the two-tail test, given above. 
Thus. Z - 3 00 ‘ 

(/;) The one-tail probability P of obtaining the computed z by chance is worked out, using the unit 
normal curve table (Table A). 

p s 0.5000 - (area of the unit normal curve from its p to the computed z of 3.00) 

= 0.5000 - 0.4987 - 0.0013. 

(c) As the computed P of 0.0013 is lower than the chosen a of 0.01, the probability of the H 0 being 
correct is too low. The H 0 is, therefore, rejected and the mean systolic BP of ischemic patients is considered 
to be significantly higher than that of normal individuals (P < 0.01). 


77 f TESTS FOR SIGNIFICANCE OF 
DIFFERENCE BETWEEN MEANS 

In experiments using small groups or 
samples (n < 30) drawn at random from a 
population in which the dependent variable 
under investigation has a normal distribution, 
the scores of small groups are distributed in the 
form of a t distribution. So, to test the 
significance of the difference between the 
means of two small groups, their difference 
(Xj _ x 2 ) is converted to Student’s t score 
which is then interpreted using the / 
distribution. 

Assumptions for t tests 

To apply the t test, it should be justifiable 
to assume that : 

(a) the dependent variable whose changes 
are being studied, is a continuous measurement 
variable ; 

(b) the variable has a normal distribution in 
the population ; 

(c) each score of the dependent variable 
occurs at random and independent of all other 
scores in the group (sample) ; 

(d) the groups have been sampled from 
Population(s) having homogeneous variances 
( bomoscedasticity ) so that initially the variances 
°f the groups differ merely due to their 
sampling errors. 


It follows that t tests cannot be used (i) if 
the dependent variable is a discrete, ordinal or 
nominal variable, (»i) if it has a non-normal or 
skewed distribution in the population, or (***) if 
the populations, from which the groups have 
been initially sampled, do not obey homo- 
scedasticity. 

Degrees of freedom 

Because / distributions vary with the 
degrees of freedom ( df) of the / scores, the 
computed t must be referred for interpretation 
to the t distribution specific for the df of the 
computed /. 

Because the unit normal curve closely 
approximates or almost coincides with the t 
distributions for high degrees of freedom (i.e., 
large sample sizes), t test can be applied not 
only to small samples (n < 30) but also to 
large ones, even to those with n approaching 
co. This is evident from the very close 
similarities between the critical t scores 
(i df = °°) and the critical z scores for identical 
levels of significance (Table 7.1). Thus, for 
finding the significance of difference between 
the means of two large groups or samples 
(i df = °°), either the z score may be computed 
and then referred to the unit normal curve 
table, or the t score may be computed and then 
compared with the critical t scores ( df= °°) for 
different levels of significance. But t tests alone 
are applicable to small groups because their 


Scanned by CamScanner 









124 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


distributions then conform to leptokurtic t 
distributions only, instead of the mesokurtic 
normal distribution. 

Two-tail t test : 

This is undertaken to find the significance 
of an observed difference between two means, 
irrespective of the algebraic sign of the 
difference. It investigates the probability P of 
correctness of the null hypothesis (HJ which 
proposes that the two means are not 
significantly different from each other and are 
estimates of the same population parameter 
(§ 7.5). 

f*o : P\ = th • H a : P\* dr 
Assuming the H Q to be correct, the observed 
difference (X t - X 2 ) between the sample means 
is transformed into t score using the same basic 
formula as is used in computing z score 
(§ 7.6). 

m 

However, may be computed here in 

several alternative ways according to the nature 
of the group or sample and the type of 
experiment. 

The probability P of the H 0 being correct is 
obtained from the fractional areas in the two 
tails of the relevant t distribution beyond the 
computed t score. For this, the computed t 
score is compared with the two-tail critical t 
score having the same df as that of the 
computed t, and for the chosen level of 
significance fa). So long as the computed t 
exceeds or equals the critical t score, it falls 
within the rejection region (a) of the H r 
distribution beyond the critical t a score so that 
the P given by the total fractional area in the 
two tails beyond the computed t does not 
exceed a (P ^ a). The H 0 is then considered 
to have too low a probability of being correct 
and may be rejected — the means are then 


considered to differ significantly from ea ,.. 
other. But if the computed t is lower than iu' 
critical / score, the P given by the t 0|a j 
fractional area in the two tails beyond 
computed t exceeds the chosen a (P > cz) ; 

H 0 is then taken to have a high probability tjf 

being correct and cannot be rejected_ 

difference between the means is th Cr 
considered not significant. 

One-tail t test : 

This is used to explore whether or not o nc 
of the means is significantly higher 
(alternatively, lower) than the other. It explores 
the P of correctness of the H 0 which contends 
that one mean is not significantly higher 
(alternatively, lower) than the other. 

Either. H 0 : g, > p 2 ; //„://,> u 2 ; 

or, H 0 :p t < p 2 ; H a : < p r 

The observed difference (X, - X 2 ) between 
the means is converted to / score in the same 

way as in the two-tail test. The computed t 

score is compared with the one-tail critical i 
scone for the chosen a and having the same df. 
If the computed t either exceeds or equals the 
critical /. the fractional area in onc tail beyond 
the computed t. corresponding to the I -tail P, 
does not exceed the rejection region (o.) 
beyond the critical 1-tail t a in that tail (P ^ a). 
The H 0 is then considered to have a too low P 
and may be rejected — one of the means is 
then considered significantly higher ( or, lower) 
than the other mean. But if the computed t is 
lower than the critical /, the fractional area (P) 
in one tail beyond the computed / exceeds the 
rejection region (a) beyond the critical t a 
{P > ct) \ the Hq cannot be rejected as the P is 
considered too high — one of the means is 

then not significantly higher ( or, lower) than 
the ether. 

!• 1 tests for independent groups 

Two or more groups of individuals, used in 
an independent group experiment, are randomly 
sampled from the population independent of 




A 


Scanned by CamScanner 



TESTING OF HYPOTHESIS 


125 


so that they consist of separate sets 

eacb H°viduals ^ may or may not be identical 

of ' ndlV ' A f ter the groups have been exposed to 
in size ‘ j w jth different levels (doses, 
° r ^rations, qualities, etc.) of the 
C ° n ndent variable, the dependent variable 
^^investigated is measured in all the 
be ' n *\ The scores of the dependent variable, 
gr ° UP obtained from different independent 
l * lUS constitute unpaired observations and 
gr ° uncorrelated to each other. The significance 
of the difference between the means of two 
such groups is found by t tests in different 
ways according as the groups are large or 
small, and equal or unequal in size. 

(a) For small and unequal-size independent 


df - n, + n 2 — 2. 

(b) For both small and large independent 
groups of equal size : 

If two independent groups possess the same 
size (n), s and t may be computed as follows, 
irrespective of the group sizes. 

I(X,-X,) 2 + I(X 2 -X 2 ) 2 _ I s } + A 

2(n — 1) l 2 


^x.-Xj = J(Jx ,) 2 + (*x 2 ) 2 = 

X,-X 2 . X,-X 2 

t = — 1 — L ; or, t = 1 ; 




groups : 

For two independent groups of unequal 
sizes (n, * n 2 ), either or both being small in 
size (< 30), the observed difference (X, - X 2 ) 
between their means is converted to a t score 
using a pooled SD (s) or a common variance 

(s 2 ). Where rf and s; are the variances of the 
respective groups, and s^ are the SEs of 
the respective means, and Syj.y, is the SE of 
the difference between the means. 




I(X l -X,) 2 + I(X 2 -X 2 ) 




n, + n 2 - 2 


= [ r?(n,-l) + r 2 2 (n,-l) 
n, + n 2 - 2 




or, r = 


I 


X,-X 2 


I(X,-X : ) 2 + T(X,-X,) 2 

rij + n, — 2 


n, + nh 

n \lh 


x,-x 2 

ll( X,-X^-f I(X 2 -X 2 ) 2 

V n(n- 1) 



df = Kn - 1). 


(c) For large groups of unequal sizes : 
Where both the groups have large but 
unequal sizes (n, > 30, n 2 > 30 ; n, * n 2 ), the 
t score is computed for (X, - X 2 ) using the SDs 
of the individual groups, instead of the pooled 
SD. 


df = n, + n 2 - 2. 


Interpretation of computed t : (i) The t 
score, computed by any above method is 
compared with critical t scores, having the 
same df as that of the computed t, but different 
levels of significance, (ii) Either two-tail or 
one-tail critical t scores are used in this 



Scanned by CamScanner 





























126 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


comparison, according as a two-tail or a one- 
tail / test is intended. (i’iT) The algebraic sign of 
the computed t is ignored in this comparison. 
(iv) If the computed t either equals or exceeds 
the critical t for a chosen level of significance 
(a) not exceeding 0.05, the H 0 is rejected and 


the observed results are considered signifies 
respectively at or below that l eVc . 
(P ^ a)- But (v) if the critical / for a chosen 
exceeds the computed the H 0 cannot 
rejected and the observed results are no t 
considered significant (P > ff). 



Example 7.7.1. 

The heights (cm) of 25 male and 20 female college students are presented in the first two columns of 
Table 7.2. Find if there is a significant difference between the mean heights of male and female college 
students. 

Solution : 

The H 0 proposes that there is no significant difference between the means. To find the probability p 0 f 
the Hq being correct, a two-tail t test is done tuipg the p ool e d SD (s) because the groups are small and of 
unequal sizes. 

Table 7.2. Tabic for computing meaos and standard deviations of body height data. 


Heights (cm) 


*. - *.>* 


(x 2 - x 2 f 

Males (Y,) 

Females (Y 2 ) 

Xf - X t 

Xj - Xj 

163 

164 

- 3.4 

11 56 

♦ 2.9 

8.41 

165 

155 

- 1.4 

1.96 

- 6.1 

37.21 

170 

160 

+ 3.6 

1196 

- 1.1 

1.21 

162 

154 

- 4.4 

19.36 

- 7.1 

50.41 

160 

160 

- 6.4 

40.96 

- 1.1 

1.21 

165 

153 

- 1.4 

1.96 

- 8.1 

65.61 

170 

159 

+ 3.6 

11% 

- 11 

4.41 

165 

166 

- 1.4 

1.96 

+ 4.9 

24.01 

164 

163 

- 2.4 

5.76 

+ 1.9 

3.61 

181 

166 

+ 14.6 

213.16 

+ 4.9 

24.01 

169 

163 

+ 2.6 

6.76 

+ 1.9 

3.61 

161 

165 

- 5.4 

29.16 

+ 3.9 

15.21 

162 

167 

- 4.4 

19.36 

+ 5.9 

34.81 

165 

164 

- 1.4 

1.96 

+ 2.9 

8.41 

163 

162 

- 3.4 

11.56 

+ 0.9 

0.81 

168 

160 

+ 1.6 

2.56 

- 1.1 

1.21 

169 

159 

+ 2.6 

6.76 

-2.1 

4.41 

164 

167 

- 2.4 

5.76 

+ 5.9 

34.81 

180 

157 

+ 13.6 

184.96 

- 4.1 

16.81 

160 

158 

- 6.4 

40.96 

- 3.1 

9.61 

160 


- 6.4 

40.96 



167 


+ 0.6 

0.36 



174 


+ 7.6 

57.76 



168 


+ 1.6 

2.56 



165 


- 1.4 

1.96 


— 


168 

165 

~THm 


7% no 


Scanned by CamScanner 


* * 















TESTING OF HYPOTHESIS 


127 


(a) Using the 


data presented in Table 7.2, the group means X x and X 2 are first computed. 


= 4IM |66 „ cm . ^ = 1611 


cm. 


/», 20 

deviations of the scores from the respective group means are worked out and squared, and the 
^ed deviations of each group are totalled to give the respective sums of squares. I(X, - X,V an 


square ~ 


Z(X; 


- X 2 ) 2 - 


/ 


I(X, - X,£ = 736.00 cm 2 ; I(X 2 - X ; ) : = 349.80 cm*. 

(c) The pooled SD (s) is computed using the sums of squares. 

. _ 1 736.00349.80 = 5 025 cm . 

5 = ij - «, + nj - 2 V 25 + 20 -2 

(<0 The 5£ of the difference between means (sj^) is computed using the pooled SD. 

_ _ i = 5.025 - l JOS cm. 

n »"2 V 25x20 

(e) The difference (X, - X 2 ) between the mean it converted to f aeon. 

, s , lt6 ^!i ■ 3.51$. 

1508 

df * n, + »ij - 2 ■ 25 ♦ 20 - 2 ■ 43. 


[Alternatively, r- 


z-z 


V ", + *,-2 n/b 


166.4-161.1 


7 


736.00 + 349.80 25 + 20 

*25x20 


= 3.515.] 


25 + 20 -2 

(/) For different levels of significance, two-tad critical t scores (df = 43) are quoted below from the Table 
B of Appendix. 

= 2.416 ; 


f .05(43) “ 2-0 17 ; 


*. 02 ( 43 ) 


r Di(43) ~ 2.695 ; 


r noi( 43 ) “ 3-532. 


/ 

/ 


The computed t of 3.515 is found to be higher than the critical t m , but lower than the critical t m . So, 
the probability P of getting the observed difference between the means by chance due to random sampling 
amounts to less than 0.01, which may be considered too low. Hence, the H 0 may be rejected. So, the mean 
height of males differs significantly from that of females (P < 0.01). 

Example 7.7.2. 

The mean time for regeneration of amputed basal discs was found to be 18.6 hrs (SD 2.20) in 15 wild 
(nonmutated) animals of Hydra vulgaris, and 17.7 hrs (SD 2.70) in 12 somatic mutants of that species. Is 
the mean regeneration time significantly higher in the wild animals than in the mutants ? 


Scanned by CamScanner 


























128 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


i hioher in wild animals than j Q 

Solution : . ot significantly hi^n 

V,' H 0 proposes .he mean regeneracmn , rf ^ bemg e 

mutants. A one toi _ _ _ o 20 hrs * n 


= 15. 


_ _ 2.20 hr 5 * 71 1 ” 

Wild animals : X, = 186 ^ 7 70 hrs ; '4 = l2 ' 

Mumnu : X 2 = 1W bm ; ' usjng their respec.ive So, < t| 

w The pooled SD «> of .he smal, unequal groups « «* 
and s 2 ). _ _ __ 


[ sffa-Q + sf fa 11 = 

* = '< n, + n 2 -2^ 



00 


,v x ) between die means is next converted to a r score- 
The difference (X, - X 2 ) between 


t = 


.X, - X 2 _ 



18 . 6 - 12.7 _ o955. 

= „ [15 + 12 

* 433 Vl57l2 


_ 2 5 15 ^ 12 - 2 • 25. 

^/ = n, + « 2 / .if _ 25) for different levels of 

W The compared , score is compared wiU, — **- ' —» ^ 
significance (Table B of Ap/*m/u). 


, - 1 787 • tnimi * 2.^85 ; 1 nsan 

i,oo5(2S) ~ 0I(25) Sow thsffoMfaiUly P of //„ being correct li 

The computed t is found to be lower than Hence the // 0 cannot be rejected. So, the mean 

higher than 0.05 and may be c red too h gh. Ikncc tw itj 
regeneration time is not significantly higher ,n w.ld animals (P > 0.05). 


2.060 ; 


’. 05 ( 25 ) 


1.708. 


Example 7.7.3. 

In a numerical operarions res., .he mean and SD of the scores of 16 boys were found be 40 3 and 
8.15 respectively ; these values amounted to 37.5 and 6.35, respectively, for 16 girls, s t ere a sig 
difference between the means of boys and girls ? 


Solution : 

The H 0 proposes that there is no significant difference between the means of the two groups. To fkd the 
probability P of this H 0 being correct, a too-tail t test is done. 

Boys : Xj = 40.3 ; r, = 8.15. Girls : X 2 = 37.5 ; s 2 = 6.35. Size of each sample : n ~ 16- 

(a) The difference (X, - X 2 ) between the means is converted to t score, using the unbiased SDs of the 
independent groups of equal size. 


, = *LZ& 
%-x. 



40.3-37.5 

(8.15) 2 + (6.35) 2 

16 


= 1.084. 


df= 2 (n - 1) = 2(16 - 1) = 30. 


Scanned by CamScanner 








testing of hypothesis 


129 



DUted , score is compared with — ' —* » = » ** < ‘ iff ' rc,U 

sig^ , = 2.04: 


= 2.457 


* 01 ( 30 ) _ 2 ‘ 750 ' 


ted of 3 T084 is lower than even the critical , for the 0.05 level < 

A S the comp “ u being correct exceeds 0.05 and is considered too high. e 0 c 
probability ^f^e mean scores do not differ significantly (P > 0.05). 
and it >s inferre ° 


Examp g a( j u j t ma i es and 8 adult females are presented in respectively the first 

^ccdumnTof Table 7.3. Find whether or not the mean weight of males is significantly hig er t an 

0 f females. 


and 

that 


Solution : x 

H contends that the mean weight of males is not significantly higher than that of females. A one 
tail f test is undertaken to estimate the probability P of this H 0 being correct. 



Table 7.3. Table for computing mean body weights and 

sums of squares. 


Weights (kg) 

X « - x. 

(X, - X,) 2 

X, - X, 

(X 2 - X 2 ) J 

Males (X,) 

Females (X 2 ) 



50 

49 

-7 

49 

- 3 

9 

58 

52 

+ 1 

1 

0 

0 

- 60 

51 

+ 3 

9 

- 1 

1 

55 

56 

-2 

4 

+ 4 

16 

59 

55 

+ 2 

4 

♦ 3 

9 

56 

53 

- 1 

1 

+ 1 

1 

54 

52 

- 3 

9 

0 

0 

64 

48 

+ 7 


- 4 

16 

l 456 

416 


126 


52 


{a) Using the data presented in Table 7.3, the group means, X, and X,, are first computed. For each 
group, n = 8. 

- IX, 456 _. v IX, 416 __ , 

*> * n = — * 57 ks ; *2 * V “ — * 52 k «- 

(b) The deviations of scores of each group from its mean are worked out and squared. The squared 
deviations are totalled for each group to give the sum of squares. From Table 7.3, 

I(X,-X,) 2 = 126 kg 2 ; I(X,-X 2 ) 2 = 52 kg 2 . 

(c) The sums of squares are used in computing the variances of the respective groups. 


2 = mizM. = J 26 . = 180kE 

1 < 1-1 8-1 


*: ^ = lOkzM. = JL = 7 .4 k „2 


-—- gry = k r 

W) The difference (X, - X,) between the means is convened to r score, using the variances. 


17 




Scanned by CamScanner 




















130 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


/= = 57-52 _ 


V 


fi + 


" Jm 

n V 


+ 7.4 


2.809 ; 


df- 2(/i - 1) = 2(8 - 1) = 14. 


[ Alternatively, the t score may be computed using the sums of squares, omitting step (t). 


t = 




57-52 




= 2.809.] 


(e) The computed / score is compared with one-tail critical t scores (df — 14) for ditterent levels of 
significance (Table B of Appendix). 


* 025 ( 14 ) “ 7-145 ; 


*01(14) ~ 7-624 ; 


*005(14) ~ 7-977. 


*.05(i4) = 1-761 ; 

As the computed t of 2.809 is higher than the critical t for the 0.01 level of significance, the probability 
/’ of correctness of the H 0 is less than 0.01 and is, therefore, considered too low. So. the H () is rejected | t 
is inferred that the mean body weight of males is significantly higher than that of females (P < 0.01) 


Example 7.7.5. 

The mean and the SD of binhweights were found to be 2.9 kg and 0.65 kg respective!v for RV r, r , 

si^Hficantlyhighe/in the 608 Is thc meun birth ^«hl 

Solution : 

The H 0 proposes that the mean birthweight is not sienificintlv hi oh., • ... . . 

'"" ,s to f,nd the probability Pot this H c SLg ,h ' rd - bo ™ A one-uil 

For firs,-bom infanut : X, = 2.9 kg; ,, = 0 . 65 kg . „ _ 

For thtrd-bont infants : X, = 3.3 kg ; «, « 0 .55 kg ; = 60 o 

(ii) Because both the groups are large (n. > 30 n, > 30 ) and rh • • 2 

' scorc for ,he " erence <*■ - * *• — is c„ mpu , e ;;; n 7j' r “ ti “ i s ( ''. - * * 

V v* 



19-3.3 
1(065)^ (0.55)2 
608 832 


1 2.295 ; #= 832 + 608 - 2 = ~ 


leve. C sTr P s“^ fi ' c i“rfhe C ° mpUted ' h <*“* 1 

'.«« - >.«5 ; = ’ U0Kd Table B Sf «<** ' ~ 

«“» even Ihe critical/ s '°° 5 ‘-’ = ^ 1 W* = ^>- 

“* “ 15 infared0 °°f 11 w el s of t s ' 8mr,cancc '" 

significantly higher mean bmhve ^ £ 


Scanned by CamScanner 














TESTING OF HYPOTHESIS 


131 


^ or nol ,hcrc is a significant difference between the mean winglength score* (mm) of tht 
j-iih! vvhc ‘ * 0 f houseflies sampled from two different habitats. 
fbU0*W tw0 gf ° p 

NVing lengths (mm) : ^ 49 5 Q ^ 48 

<“] S1 : 3X 3-7. 3.6. 4.o'. 3.3. 3.4, 3j', 3 X 3.0. 3.4. 

Solution : . 

nK // proposes that there is no significant difference between the means. A two-tail t test is one, 

Mho pooled SD (i) of the two small independent groups of unequal sizes. 

(j) Entering the data in the first two columns of Table 7.4. the group means, X, and Y 2 , are compute 

V V ST 

4.8 mm. 


n. 


12 ; 


v Ml 


57.6 

12 


X J = ^ = ^=3, mm . 


Table 7.4. Table for computing mean winglcngths and sums of squares. 


Winglcngths (mm) 

y . y 

(X, - X,) J 

X, - x> 

(X, - X 2 > ! 

Group 1 (X,) 

Group 2 (Xj) 


A 2 

4.9 

3.1 

+ 0.1 

0.01 

-0.3 

0.09 

5.2 

3.7 

♦ 0.4 

0.16 

♦ 0.3 

0.09 

4.7 

3.6 

-0.1 

0.01 

♦ 0.2 

0.04 

5.3 

4.0 

♦ 0.5 

0.25 

♦ 0.6 

0.36 

3.9 

3.3 

-0.9 

0.81 

-0.1 

0.01 

5.4 

3.4 

♦ 0.6 

0.36 

0 

0 

4.5 

3.3 

-0.3 

0.09 

-0.1 

0.01 

4.9 

3.2 

♦ 0.1 

0.01 

-0.2 

0.04 

4.8 

3.0 

0 

0 

-0.4 

0.16 

5.0 

3.4 

♦ 0.2 

0.04 

0 

0 

4.2 


-0.6 

0.36 



4.8 


0 

0 



I 57.6 

34.0 


2.10 


0.80 


0 b) The deviations of the scores of each group from its mean are worked out and squared. The squared 
deviations are totalled for each group to give the respective sums of squares which are then used in working 
out the pooled SD (i). From Table 7.4, 


I(X,-X,) 2 = 2.10 mm 2 ; LfXj-X^ = 0.80 mm 2 ; 


mm. 


„ l ^X.-X^ + KX.-X,) 2 fiio + o.so _ A , Q 

s “ \ \ 12 + 10-2 038 

(c) The SE of the difference between means ) is computed using the pooled SD. 

t/L *** = * ^ = °- 38 JH = 0163 


mm. 


Scanned by CamScanner 

















132 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


(d) The difference (Y, - Y 2 ) between the means is converted to t score. 

' = = ~- j - r3 4 = 8.589 ; df=n. + ru- 2 = 12 + 10 - 2 = 20. 

%-x 2 0163 ■/ l *2 

(e) The computed t score is compared with the following two-tail critical t scores (df = 20) quoted fWs 

Table B of Appendix. ^ ^ 

^.05(20) — 2.086 , ^. 02 ( 20 ) = 2.528 i foi( 20 ) = 2.845 , ^.ooi( 20 ) 3.850. 

As the computed t of 8.589 is found to be higher than the critical the probability P of correctne 
of the H Q is less than 0.001 and may be considered too low. So, the H Q is rejected, and it is inferred th^ 
there is a significant difference between the mean winglengths of the two groups (P < 0.001). 


) 


Example 7.7.7. 

The mean and SD of steadiness test scores were found to be respectively 5.9 and 1 85 in a eroim of ^ 

women, and respectively 5.1 and 1.42 in a gmup of 42 men. Is ,he mean sleadiness score sSflci? 
higher in women than in men ? K ‘ nca ntly 

Solution : 

The H 0 proposes that the mean steadiness score is not significantly hiohrr in m nm ,u 
rod r test is undertaken to work out the probability r of tlTw o bctag correct m mCn ' A 


Women : Y, = 5.9 ; = j. 8 5 ; 

Men : Y 2 = 5.1 ; ,, = |. 42 ; 


n, = 40. 
n 2 = 42. 


~ ^ (> 3 °> - ° f th* "difference «, - between thei, 


t = *\~ x 2 _ 5.9-5.1 

M + & 7 ( 1-85 1 (1.421 2 

\fb n, V 42 + 40 


= 2.203 ; 


df = n l + n 2 - 2 = 40 + 42 - 2 = 80. 


(b) The computed t is compared with the foil • 

B 0f “• ; _ foU0W “ 8 ' — W - 80, quoted from Table 

The computed r of laBis f„„ T k. , °“ <s0> = 1!>90 ; ’mm, = 2.374. 
the probability P of the H bein l ° ^ the critical t but low 

»0 15 rejected, and it is inferred fhaZ!" “ U ‘“ ° 025 ' tSty be co^dert? t' 1 * t*™ S °' 

(P < 0.025). h mean steadi °ess score is significant h a- u ^ t0 ° ° W ’ Hence ’ the 

* n iy hgher * women than in men 

topuiatton, s^trrLrr from a 7 :t 01 ,be 

ubsequen,iy as - iras 

• ° btained froin ^ single group after 




Scanned by CamScanner 











TESTINO on HYPOTHESIS 


133 


. exposure to the respective levels of the 
Independent variable, form paired obser¬ 
vations, each individual having one pair of 
scores, and are correlated with each other. For 
example, the strength of the knee jerk reflex 
may be initially measured in a group of 
individuals after injecting them with placebo 
free from adrenaline (control group) ; they are 
then injected with adrenaline and the strength 
of the knee jerk reflex is subsequently 
measured again in them (experimental group). 
These two sets of knee jerk strength scores 
form paired and correlated observations. The 
effect of the independent variable on the 
dependent one is explored by estimating the 
significance of difference between the means of 
such paired observations by / test. However, the 
computation of t for such paired scores requires 
the use of the correlation coefficient (r) 
between the two sets of scores ; but r cannot 
be computed that correctly for small groups. 
So, to bypass the use of the correlation 
coefficient, a r test by the difference method is 
used for finding the significance of the 
difference between means of paired 
observations of a small group (n < 30) in such 
a single-group experiment. 

For a two-tail t test by the difference 
method, the H Q proposes that the mean 
difference (D) between the paired scores does 
not differ significantly from 0. For a one-tail t 

tesl ' the H o proposes that D does not have a 


significant positive (alternatively, negative) 
value. To estimate the probability P of 
correctness of the // 0 , the mean difference ( T)) 
Is converted to a t score. Where n is the 
number of pairs of scores, D is the difference 
between the scores of any pair, £/-> is the sum 
of all such differences between paired scores, 

/) and s D are respectively the mean and the 
SD of those differences, and Sq is the SE of 
D. 



Alternatively, / may be computed directly 
from the dittcrcnces (D) between the paired 
scores and the squared values (D 2 ) of those 
differences. 



Ihe computed t score is then compared 
with the critical t scores with the same df The 
difference between the paired observations is 
considered significant if the computed t equals 
or exceeds the critical / for the chosen level of 
significance (P $ a). On the contrary, the 
difference is not significant if the computed / is 
lower than the critical / for the chosen 
significance level (P > a). 


Example 7.7.8. 

make a acllieveinent test scores of 10 students, before and after practice 
significant difference in achievement test scores ? 

Individuals 
Achievement scores 


(0 before practice 
(“j after practice 


72 

67 

90 

97 

84 

92 

120 

81 

110 

103 

109 

137 


are given below. Does practice 

7 8 9 10 

65 75 80 69 

115 82 110 89 


Scanned by CamScanner 










STATISTICS IN BIOLOGY AND PSYCHOLOGY 


134 

Solution : 

The Hq contends that there is no significant difference between the paired scores, any observed differ 
being due to mere chances associated with random sampling. To estimate the probability P of correctness ' 
this H 0 , a two-tail t test by the difference method is applicable as the paired scores belong to a small sin 
group. 


1. First method : 

(a) The difference D is worked out between the paired scores (Y, and X 2 ) of each individual and enters 
in Table 7.5. These differences for all the pairs of scores are totalled to give ID which is used in workm- 
out the mean difference D . 



Scanned by CamScanner 










TESTING OF HYPOTHESIS 


135 


Table 


? 6 Tab ie for t test of achievement test scores by the difference method, 
using the squared differences. 

Achievement test scores _ D 

(X, - X.) 


Individuals before practice (X,) after practice (X,) 


72 

67 

90 

97 

84 

92 

65 

75 

80 


120 

81 

110 

103 

109 
137 
115 

82 

110 


+ 48 
+ 14 
+ 20 
+ 6 
+ 25 
+ 45 
+ 50 
+ 7 
+ 30 


D 2 

2304 

196 
400 
36 
625 
2025 
2500 
49 
900 
400 


1 
2 

3 

4 

5 

6 

7 

8 

9 

10 

i 

2 Alternative method *. 

(ome difference D is worked on. be.ween .he paired scores (X, and X 3 ) of each individual and en.ered 
in Table 7.6. These differences for all the pairs of scores are totalled to give ID. 

{ b) Each such difference is squared to give D 2 . All such D 2 values are entered in Table 7.6 and totalled 
to give ID 2 . 

(c) The t score is computed, using ID. ID 2 and the group size (n = 10). 



f - - ~ 65 -— s 5.118 ; df = n - \ = 10-1 = 9. 

(/.ID 2 -(ID) 2 1 10 x9435 -(2651 2 

\ \ 10-1 

The t score, computed by any of the two methods is then compared with the two-tail critical t scores 
(df = 9) for different levels of significance (Table B of Appendix). 

r 05(9) = 2.262 ; t om = 2.821 ; r 01(9) = 3.250 ; f.oouo) = 4 - 78L 

As the computed t exceeds even the critical foor the probability P of correctness of the Hq is lower than 
0.001 and is considered too low. So, the H Q cannot be retained, and it is inferred that practice produces a 
significant difference in achievement test scores (P < 0.001). 


Example 7.7.9. 


The amounts of ethereal sulfates (mg) in 24 hours’ urine of 8 individuals, kept successively on low and 
high protein diets, are given below. Is there a significant difference in the urinary excretion of ethereal 
sulfates due to changes in the dietary protein content ? 

\ Individuals H :12345678 



Ethereal sulfates 
(0 low-protein diet 
(n) high-protein diet 



90 

102 

114 

103 

95 

115 

106 

96 

123 

125 

138 

115 

132 

141 

126 

121 


Scanned by CamScanner 














136 


STATISTICS fN BIOLOGY AND PSYCHOLOGY 


Solution : 

The Hq proposes that there is no significant difference between the paired observations, any 0 ^ s< 
difference arising from mere chances of random sampling. A two-tail t test by difference methini is j ^ 
to the paired observations of the small single group to find the probability P of the H 0 being correct 

1- First method : 


Table 7,7. Table for t test of urinary ethereal sulfates by the difference method, 

using the mean difference. 


Individuals 


Ethereal sulfates 


low-protein diet (X,) high-protein diet (X 2 ) 


(£>-5)1 


D 

+ 33 
+ 23 

♦ 24 

♦ 12 

♦ 37 

♦ 26 
♦ 20 

(a) The difference D is woriced out b(t«M the paired tom CT and * w r i 
>n Table 7.7. These differe. a |! {hc 2* ““ of ca,h "" lr11 and end" 

outtherncwdiffJcTr^^ZD 


90 

102 

114 
103 

95 

115 
106 


123 

125 
138 
115 
132 
141 

126 


D-D 

+ 8 
- 2 
- 1 
-13 
+ 12 

+ 1 
_ < 


64 

4 

1 

169 

144 

1 

25 

0 



n = 8 




200 

8 


= 25 mg. 


n 8 “ 

(b) The deviation of each D from 75 „ . 

opined is squared and the squared deviation (D B“ Table ? ' 7 ' EaCh ( ° ' 0 

fni,° >J of au ,ik squared deviauons »— - co.pu’nrr;;:;;: t ,hai,abi ;- Thc "'^ 

in tum m r ftm n„, ing the SE (J5) Qf £ ^ g me bD of the differences (s D ) which it 


in tum in computing the SE (jg) of D 


408 


n-1 ' VITJ = 763 mg ; 


5 d - 


(c) D is then converted to / score using s^. 


= *d 7.63 
In 7T 


- 2.70 mg. 


t = 


D_ _ _25_ 
So ~ 2.70 


= 9.259 ; 


#=n-l=8-l=7 


2. Alternative method : 

(a) The difference D is worireH 

m ^ dlfferCnCeS ■“**** y^d V of individual and 

(b) Each such difference n i. pairs to give I D. 

ZD , omcnnc'D ts squared p 

h 16 ^ values are entered in Table 7.8 and totaliei 


Scanned by CamScanner 









testing of hypothesis 


137 


Tabic 7-8- Table for t test of ethereal sulfates by the difference method, usina the squared differences. 

Individuals 


X, 

D = (X. - X,) 

D 2 

1 

90 

123 

+ 33 

1089 

2 

102 

125 

+ 23 

529 

3 

114 

138 

+ 24 

576 

4 

103 

115 

+ 12 

144 

5 

95 

132 

+ 37 

1369 

6 

115 

141 

+ 26 

676 

7 

106 

126 

+ 20 

400 

8 

96 

121 

+ 25 

625 

Total 



+ 200 

5408 


(c) The t score is computed using ID, ID 2 and n (= 8). 



200 

18 x 5408-(200 1 2 

V "8-1 


9.262 ; 


#=,,-1 = 8-1= 7. 


Interpretation : 

The t score, computed by either of the two methods, is next compared with the two-tail critical t scores 
(<// = 7) for different levels oi 

'. 05 ( 7 ) = 2 365 ; W> = 2 - 998 ; ^,C7) * 3.499 ; r 001(7) = 5.405. 

As the computed t far exceeds the critical r^,. the probability P of correctness of H 0 is less than 0.001 
and is considered too low. So. the H i s iejected, and it is inferred that changes m dietary protein contents 
produce a significant difference in the urinary ethereal sulfate excretion (P < 0.001). 


Example 7.7.10. 

Hourly oxygen consumptions (in ml per 100 g of body-weight) were found to be as follows in a sample 
°f nine parakeets, respectively before and after exposure to a pesticide. 

Individual : 123456789 

Before pesticide (X,) : 182 157 173 185 175 168 179 159 170 

After pesticide (X 2 ) : 157 122 140 166 138 148 148 130 147 

Find whether or not the oxygen consumption is significantly higher before exposure to the pesticide than 

at after exposure to it. 

Solution : 

one^* COntends Ant the oxygen consumption is not significantly higher before exposure than after it. A 
*s r / est * s undertaken by the difference method to work out the probability P of correctness of this H, 
c Paired scores belong to a small single group. 

18 


Scanned by CamScanner 



















138 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


1. First method : 

(a) The difference D is worked out between the paired scores (X ( and X 2 ) of each individual and enters 
in Table 7.9. All these D values are totalled to give ZD which is used in computing the mean difference 5 


n = 9 ; 


_ = = 252 = 28 Q m , 

n * 


Table 7.9. Table for t test of oxygen consumptions by the difference method* 

using the mean difference. 


TnHi virlnalc 

Oxygen consumptions 

D 

(X, - X 2 ) 

D-D 

(D-D) 2 

before pesticide (X,) 

after pesticide (X 2 ) 

1 

182 

157 

+ 25 

-3 

9 

2 

157 

122 

+ 35 

+ 7 

49 

3 

173 

140 

+ 33 

+ 5 

25 

4 

185 

166 

+ 19 

- 9 

81 

5 

175 

138 

+ 37 

+ 9 

81 

6 

168 

148 

+ 20 

- 8 

64 

7 

179 

148 

+ 31 

4* 3 

9 

8 

159 

130 

♦ 29 

+ 1 

1 

9 

170 

147 

♦ 23 

- 5 

25 

Z 


• 

252 


344 


(b) The deviation of each D value from D is worked out and entered in Table 7.9. Each such (D - D) 

is squared and the squared deviation (D- D) 2 is also entered in the table. The sum I ID - D) 2 of all the 
squared deviations is used in working out the cr> a & } or all the 

the SE (jjj) of D. ° of the differences (r D ), which is used in turn in computing 


_ I ZW-D) 2 [344 

V /1-1 ~ V9^T “ 6 - 


557 ml ; 


_ 


in = 


_ 6.557 „ 

77 = 77T = 2186 ml 


(c) D is converted to t score, using s^. 


D 28.0 

^ 2.186 ' 11809 ; 


df=n - 1 = 9 — 1 = 8. 


2. Alternative method : 

in Table 7.10. All these D v^^a^'tM^tal”o%?Eo <X ‘ ^ ° f each individual “d entertd 

(h) Each £> is squared * AU die * values are entered in Table 7,0 and totalled give I* 


Scanned by CamScanner 



















TESTING OP HYPOTHESIS 


139 


Tabic 7.10. Table for t test of oxygen consumptions by the (inference method, 
using the squared differences. 


Individuals 

*. 


P m (X, - X,) 

/> 

1 

182 

157 

+ 25 

625 

2 

157 

122 

+ 35 

1225 

3 

173 

140 

+ 33 

1089 

4 

185 

166 

+ 19 

361 

5 

175 

138 

+ 37 

1369 

6 

168 

148 

+ 20 

400 

7 

179 

148 

+ 31 

961 

8 

159 

130 

29 

841 

9 

170 

147 

23 

529 

I 



252 

7400 


(c) The t score is computed, using I D and ID*. 


I D = 252 

I»£/) : -(£/))- 19 x 7400 i2V, 

V n-1 V" 9-1 


12.810 ; 


rf/ * n - 1 ■ 9 - 1 ■ 8, 


Interpretation : 

The f score, computed by either of the two methods, is not otmptred with the one-tail critical t aeons 
(df = 8) for different levels of significance (Tkblc B of Appendix). 

r .05(8) = li86 ° ; r .025(8) “ 2 306 > 'joitti * 2 896 *• f eoj<S) * 3.355 ; '.ooosw 3 5 041 • 

As the computed t far exceeds the critical J tail t 1K , jK , the probability P of the H 0 being correct is less 
than 0.0005 and is considered too low. So. the H 0 is rejected, and it is inferred that the mean oxygen 
consumption is significantly higher before exposure to the pesticide than that after the exposure (P < 


3. t test for paired observations of 
niatched-pair and large single groups 

Two equivalent or matched-pair groups may 
be constituted by including in one group such 
individuals, each of whom is matched with an 
individual of the other group with respect to an 
initially measured variable, either identical with 
0r related to the dependent variable to be 
studied subsequently. The two matched groups 
Me then exposed to or treated with different 
ex els of the independent variable — one of the 
Poups may, for example, serve as the control 
group treated with placebo devoid of the 
ln e pendent variable ( level 1 of treatment) 


while the other serves as the experimental 
group exposed to the level 2 of treatment 
consisting of a given dose of the independent 
variable. The dependent variable is 
subsequently measured in the individuals of 
both the groups. The two sets of final scores of 
the dependent variable constitute paired and 
correlated observations (page 126). For a t test 
of the difference between means of two such 
equivalent groups, the product-moment 
correlation coefficient (Pearson’s r, vide § 8.2) 
between the two sets of scores has to be 
used in computing the SE (*y y,) of the 
difference. 


Scanned by CamScanner 



















140 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


For a single-group experiment using a large 
group (n ^ 30), the / test by the difference 
method, proposed for a small single-group 
experiment (pages 126-127), is not appropriate. 
Instead, the correlation coefficient r (vide 
§ 8 . 2 ) computed between the paired scores has 
to be used here also in working out s p F . 

Thus, for n pairs of scores of either 
matched-pair groups or large single groups. 




I(X, - *,)(*,-5) 


Jnx t -x l i l T l (x 2 -)c 2 f ' 
*7,-% = V (j T, ): + (l T: )2 - 2r i Y 1 ^' ; 




df = n - 1 . 


% 




To find the P of the H 0 being correct , h 
computed / is compared with critical t Sco ' 
with the same df, but for different levels 
significance. The inference is drawn accordi° 
as the P is considered too low or too high 
in the preceding examples of t test. * 


Example 7.7.1], 

, Th f 8U * n * th * oi kn »eJeric reflexes On of -- - 

and relaxed conditions, respectively era -.vcn ,n ihtZj.t to a ***** <* 30 mo, under tensed 

::::::..— KSaSTiKSItSrsS 

Solution : 

The Hq contends that the mean knoeierk \irmmK 

Because i, is , tog, «. --- *• Iwo condition, 

' between the paired scores of the two sets, to wort ol, to? " c0 ' ffic « 

« « of the scores of „ rh " 1 _T “ "* being correct 


" - - - - of each se, are used to ZZZLZZZ U 

*2 = -^1 = 2 “ = 24 

(A) A-, and are then used in computing respectively tv 7 , ^ - 

• Ved J0 ® lve respectively « - *,)= a„ d (X -X)'- rr hi “ d * 2 ) values, each of which is 

z., 
of 


s,uared _,0 give respectively (*, - ^ and « ?,= I,'y' ^ “ d ^ ~ values, each of which is 
0 "? - then w orked out ^ '~e o/ vi,. 

' ,W ° SC,S ° f SCOres 7 - U) “ computing the SDs (,, and s 2 ) ol 

* «-i V30^T~ 3 -"i 

(c) The S£s of the two means, viz., jy and 


_ ^1 3.99 

^5 ~ 7' = 7 — = 0.728 ; 
vn V30 


j 2 = JSSZEZ _ r 594 ” 

V /, ~ 1 “V30=T= 4 -53. 

'x,. are computed, using j and c r 

£ J i and s 2 respectively. 


jy = -zi _ -»-jj 

; X - 75 ? = °- 827 - 


4.53 


(^/) The product of (A' - ) and rv y" 

•be individuals a. totalled , 0 ^ ^ - *ese ptoducts for ab 

I 1XX, - X,). ( Se€ TabJe 7 j jj 


Scanned by CamScanner 














TIUTINtt Ol HYPf/IHI SIS 


141 


7,11. W 
I 

irira'd (X t ) 


for computing the SIH and live correlation coefficient of kneejerk s< - ufC * 


mrcngthw 
relaxed (X^J 


X,-7, (X,X 2 -X 2 (X 2 -X 2 )* ix,-x,xx 2 -x 2 ) 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 
11 
12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 


30 

25 

25 

27 

0 

-5 . 

0 

25 

+ 1 

+ 3 

l 

9 

0 

- 15 

31 

28 

+ 1 

1 

+ 4 

16 

+ 4 

24 

20 

-6 

36 

-4 

16 

+ 24 

31 

22 

+ 1 

1 

-2 

4 

- 2 

28 

16 

-2 

4 

- 8 

64 

+ 16 

27 

31 

- 3 

9 

♦ 7 

49 

-21 

34 

27 

+ 4 

16 

+ 3 

9 

+ 12 

36 

23 

+ 6 

36 

- 1 

1 

-6 

35 

26 

♦ 5 

25 

♦ 2 

4 

+ 10 

37 

29 

♦ 7 

49 

♦ 5 

25 

+ 35 

30 

32 

0 

0 

+ 8 

64 

0 

32 

27 

♦ 2 

4 

+ 3 

9 

+ 6 

28 

15 

-2 

4 

- 9 

81 

+ 18 

26 

14 

-4 

16 

- 10 

100 

+ 40 

29 

20 

- 1 

1 

-4 

16 

+ 4 

24 

20 

-6 

36 

-4 

16 

+ 24 

25 

30 

-5 

25 

♦ 6 

36 

-30 

31 

22 

+ 1 

1 

-2 

4 

-2 

34 

25 

■f 4 

16 

♦ 1 

1 

+ 4 

35 

27 

+ 5 

25 

+ 3 

9 

+ 15 

28 

20 

- 2 

4 

- 4 

16 

+ 8 

24 

27 

-6 

36 

+ 3 

9 

- 18 

31 

23 

♦ 1 

1 

- 1 

1 

- 1 

35 

27 

+ 5 

25 

+ 3 

9 

+ 15 

30 

24 

0 

0 

0 

0 

0 

36 

26 

+ 6 

36 

+ 2 

4 

+ 12 

31 

25 

+ 1 

1 

+1 

1 

+ 1 

28 

22 

-2 

4 

-2 

4 

+ 4 

25 

20 

-5 

25 

-4 

16 

+ 20 

900 

720 


462 


594 

177 


^ e ) The correlation coefficient (r) is computed between the paired scores of the two sets, using the 

products and the sums of squares. 


sum 


177 


r = 


I(X,-X,)fX 2 -X 2 ) _ = 
>/X(X,-X,) 2 1(X 2 -X 2 f V462X594 


= + 0.34. 


r J SE of the difference between the means, is worked out using the computed sy { , sy and 


Scanned by CamScanner 
















142 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


% X . +(>)?,I 2 = JlOTM) 2 + (0 827) ! - 2 xO.M x 0.728 X0.S27 = 0,897. 

(/,') The difference between the means is then converted to t score. 

/ = hzh. = = 6.689 ; #-*- 1 = 30 - 1 = 29 . 

0897 

Interpretation : 

The computed t is compared with the two-tail critical t scores (df — 29) for different levels of 
significance (Table B of Appendix). 

^x>H29) ~ 2-045 ; f xi 2 ( 29 ) = 2-462 ; , .oi( 29 i = 2.756 ; * oot(») = 3-659. 

AH the computed t exceeds the critical t Ml . the probability P of the H 0 being correct is less than 0.001 
and is considered too low. So, the H fJ is rejected and it is inferred that the mean kneejerk in the tensed 
condition is significantly different from that in the relaxed condition (P < 0.001). 


Example 7.7.12. 

I lie scores, obtained by 40 subjects in two 'Ucce+Mve tnals of a sensory-motor test, had the following 
respective means and standard deviations. h 

Trial 1 : X, = 48.8 ; x, a 9.87 ; 

Trial 2 : X 2 = 55.7 ; x, - H.S5. n * 40. 

The correlation coefficient (r) between the paired scores of the two trials worked out to be +0.58. 
Find whether or not the mean score of trial 2 is significantly higher than that of trial 1. 

Solution : 


the ^imyTof EV* *“ 2 “ « high* .han tha, of trial 1. To estint* 

between the paired scores belonging to tTtor^gfe “ nd ' mke " ** *»«&* 0 

(a) The standard errors, jy and ry of the n+<;rw+c-»,-,,„ „„ 
ii.- . x ' x -' 016 respecuve means are computed, using the resDective stands 

deviations, and are ,n an used in wortting out the S£ of difference ) between the means. 


r, 9.87 
= = = '- 56; 


**-> - 


s, 11.55 

= X = ^ = 183 •- 


r = +0.58. 


^i-T, = r. % . % = VaM) , +(L83)’-2x0.si^!^Ii; . , „ 2 


(b) The difference <X 2 - X,) between the means is convened to r score. 
t = Yllli _ 55.7-48,8 _ 

5 F,-x 2 ~ 1-572 — 4 389 ; df = n - 1 = 40 _ 1 = 39 

The computed t is compared with the one-tail critical t 


scores (df = 39) for different level* of 


Scanned by CamScanner 











TESTING OF HYPOTHESIS 


143 


rtlflconce. 


i 


. 0005 ( 39 ) 


^ s 1.6885 ; r .oi(39) “ 2 - 42 6 *• r .oo5(39) ~ 2 708 • 

" nutctl t exceeds die critical t ms , the probability P of the H Q being correct 
As the eon I ^ j oW go, the H Q is rejected, and it is inferred that the mean 

than that of trial 1 (P < 0.0005). 


3.558. 

is less than O.OOOd 
score of trial 2 is 


GLOSSARY 

tivc hypothesis : the hypothesis which proposes that the result of the experiment is significant- 

t-uroup experiment : an experiment in which groups to be exposed to different levels of the 
c< l u * va independent variable are initially matched with respect to a variable either identical with or related to 
the dependent variable. 

rrors of Inference : errors in either accepting or rejecting the null hypothesis depending on probabilities. 

homosccdasticlty : the assumption that the groups, drawn for an experiment, initially have homogeneous 
variances differing only due to sampling errors. 

independent group experiment : an experiment in which groups to be exposed to different levels of the 
independent variable consist of such separate sets of individuals as arc not related to each other. 

level of significance : the highest level of probability upto which the probability of correctness of the null 
hypothesis (W 0 ) is considered so low that the W 0 is rejected. 

null hypothesis : the hypothesis which contests the alternative hypothesis and proposes that the result of the 
experiment is not significant, being merely due to the chance use of a particular random sample. 

one-tail test : a test for finding whether or not the mean of one group is significantly higher (alternative!) 
lower) than that of another in an experiment. 


paired observations : observations of a single-group or equivalent-group experiment where each score of 
one set of observations is paired with a score of another set, giving rise to a correlation between the 
two sets of scores. 

single-group experiment : an experiment in which the same group of individuals is exposed successively to 
different levels of the independent variable. 

two-tail test: a test for finding w hether or not there is a significant difference between the means of two 
groups in an experiment, irrespective of which of them is higher or lower than the other. 

type 1 err °f : error of inference owing to the wrongful rejection of a correct null hypothesis, depending on 
die level of significance. 

Iype 11 error : error of inference owing to the wrongful acceptance of a wrong null hypothesis. 


Scanned by CamScanner 




8. CORRELATION AND REGRESSION 


Correlation coefficients measure quanti¬ 
tatively the relationship between variables. 
Regression predicts the most likely value of a 
variable Irom the valuc(s) of one or more other 
variables. 

Bivariate statistics analyze the data of two 
variables in a sample, measure the relationship 
between two variables, or predict the value of 
one variable from the given value of another 
variable in the same individual. Multivariate 
statistics analyze the data of more than two 
variables, measure their relationships, or predict 
the most likely value of one variable from the 
given values of two or more others. 


8.1 CORRELATION 


Correlation explores the magnitude and 
direction of association between two or more 
variables, i.c., how far the variations of a 
variable are related to those of one or more 
other variables in the same individual. It thus 
gives (he magnitude and the algebraic sign of 
concomitant variations of variables. 


Types 

(fl) Correlation is either linear or nonlinea 
according as the relation between the variable 
can be described by a straight or a curved lim 
In linear correlation, the magnitude of chang 

Of one variable bears a conslam ratio to that c 
the other variable(s). 

and (/ 1n COrre,ali ° n may be positive if the hie 
and low magnitudes of one vari.hu 8 

associated with respectively the hioh^A ? 
magnitudes of the other it ? ^^ °' 

high magnitude «f ’ ne 8ative if th 

low magnitude of WUh lh 

(C> B ° ,h and "°"'i„ear correction 


may be simple or multiple, according at th^ 
measure the relation either between t*, 
variables or between one variable and t|^ 
weighted sum of two or more others. 

For example, the correlation between body 
surface area and the weighted sum of heig},. 
and weight is a multiple correlation. Th» 
correlation between the total RBC count ajyj 
the blood haemoglobin concentration is tinn^ 
and linear, while that between the substrate 
concentration and the initial velocity of enzynv: 
action is simple but nonlinear, either hyperbolic 
or sigmoid. 


Properties 

I. Correlation holds good only within the 
limits of the population and other conditions m 
which it is estimated. It cannot be generalized 
beyond those limits. 


2. It may not indicate any cause and effect 
relationship between the variables. It merely 
indicates an association between their changes 
without inferring whether or not the change in 
one has caused the change in the other. It is 
possible that the correlation has resulted from 
the common influence of some other variables; 
on the correlated variables. 


# —yicuiLi uic value ui 

one variable from that of another. 

4 L , Corre ^i°n holds good even if the 

*. . CS ^ free to vary at random , 

mg ued or deliberately controlled by 
the investigator. 


p_. . . uucri irom sampling error. 

at ' 0n coeffic i«nts of samples <ta» 
around I?™ 131 ' 0 " fonn a sampling distribute 

the ** parametric con-elation coefficieot > 
•he mean. The SD of such a sample 


144 


Scanned by CamScanner 




CORRELATION AND REGRESSION 


145 


dis'^'T 

‘ ocf T!n« 

si«n ificn , 

coc«i« nl - 


j S die SE of the correlation 
which is used in testing the 
of the computed correlation 


product-moment correlation 

^ % product-moment correlation coefficient 

formulated by the British mathematician 
(r), j p carso n (1857-1936), is a measure of the 
Knf nitude and direction (algebraic sign) of the 
"Ttion between two variables in a sample 
fC | * n their relationship can be described by a 
W hl h nc . It is thus a coefficient of simple 
fruuir correlation. Where X and^Y are the 
scores of the variables, X and Y are their 
respective means, s x and s y are the respective 
standard deviations, z x and z Y arc the standard 
. jj Cor es of X and Y respectively, and n is the 
iiumbcr of individuals (or pairs of scores) in 


the sample, 


X-X Y-Y 

z x= — ; — ; 

_ Ifrfr = l ‘ v x S 1 

n n 



. i(X-X)(y-y> 

ns x s Y 

This is the basic formula for r. 

Assumptions 

To compute and use the Pearson’s r, it 
should be reasonable to assume that : 


(c) the paired scores of the two variables in 
each individual are independent of sue P c 


( d ) there exists a linear relations up 
between the scores of the variables. In o er 
words, the scattergram of their scores fits wi 
a linear model (page 34). 


Properties 

1. The magnitude and the algebraic sign of 
r indicate respectively the degree and the 
direction of linear relationship between the 
variables. The value of r ranges between —1.00 
and +1.00, and does not bear the unit of any 
of the variables. A value of 0 indicates the 
absence of any linear correlation. The closer is 
the value of r to -1.00 or + 1.00, the stronger 
or higher is the relation between the variables. 

2. Even if a constant number is added to or 
subtracted from all the scores of the two 
variables, or if they are divided or multiplied 
by a constant number, Pearson’s r remains the 
same. 

3. A positive r indicates that an individual, 
having a high score in one variable, is likely to 
possess a high score in the other variable ; on 
the other hand, an individual with a low score 
in one is expected to have a low score in the 
other too. A perfect positive correlation is 
indicated if r amounts to +1.00. In such a case, 
each individual has the same z x and z Y scores 
for his X and Y scores. 

z x = z Y ; 


(a) both the variables are continuous meas¬ 
urement variables of either interval or ratio 
l ypc, r being inapplicable to nominal, discon- 
fmuous and ordinal variables ; 

5 ^)variables have unimodal and fairly 
y«wn etrical distributions in the population 
* out marked skewness, although they need 
• e non hally distributed ; 


or, X = X + (y - Y ) s x ls Y , and 

Y - Y +{X-X)s Y /s x . 

A negative r indicates, on the contrary, that 
a high score of one variable is likely to be 
associated with a low score of the other, and 
vice versa. For a perfect negative correlation, r 
amounts to -1.00. For each individual in such 
a case. 


19 




Scanned by CamScanner 











STATISTICS IN BIOLOGY AND PSYCHOLOGY 



Fig. 8.1. Three sampling distributions of 


variance of one variable may be assocj- 
with the variance of the other ; the remain^ 
proportion, 0.64 here, is the coefficient 
nondetermination indicating such proportion ' 
the variance of one variable as is not associa»°^ 
with that of the other. In some cases, howevJ 
the percentage dependence has a value differ.. 

r _ 0 ^ 

from n. 


z x = -z y ; 

or, X = X -{Y-Y)s x Is y , 
Y=Y - (X - X ) s Y ls x . 


and 


Computations from ungrouped data 

1. Using unbiased SD : 

It may be recalled that for ungrouped d aLa 
of small samples, unbiased standard deviation 
(s x and s Y ) of variables X and Y are computed 
Seldom, however, a perfect correlation like usin § lhe de S rees of freedom, viz., (n ~ 

+ 1.00 or —1.00 is obtained in practice. In many instead ot the sample size (page 52). For such 
cases, a correlation of 0.60 is considered quite ^ples. Pearson’s r is computed between th e 
high — even a much lower correlation like var ‘ables X and Y by the modified basic 
0.30 may be useful if the probability (P) of its ^ ormu,a using their respective unbiased SI) 
random *• — (** s y ). the sample size („), J 

fl X ‘ f Y )a the sum of products 

(Example 8.2.1.) 


~ -* 11KC 

0.30 may be useful if the probability ( P) of its 
random occurrence is too low. 

4. Due to random variations of the variables 
m samples, r suffers from sampling errors and 
vanes from sample to sample drawn from the 
same population. So, sample r values form a 
sampling distribution around the parametric 

saWMftSS 

i 

distribution hV'sE^)^ 




I(X-X) 2 

n-\ 


Sy = 


1 (Y-Yf 


n -1 


r- I(X-X)(K-n 

(fl 1) S%Sy 


coefficient) does' notd^clT 1 ' „ 

Percentage relationship or denenH §We the 

able on anodter. unless ,ts vl " °” e 

1.00 or 0.00. In mn ^ Ue ls + b00, 

squared value of the ^ howevei * the 

variables and i s cal Ft dence of <he 
determination. Thus if * 6 COe ^ icie nt oj Wher 
on,y a 036 


Cov (X,Y) = l(Xj^O(K-F) 

n -’ or ’ 

Cov (X,y> = IXY _ lxi Y 


“• Using covariance : 

“i* 

centage *-e , he respechve meL ' h ° Se Va " ables from th ' i ' 


ib* 


Scanned by CamScanner 
















CORRELATION AND REGRESSION 


and 

Cov ~ n * 

CovO^l. (See Examples 8.2.1. to 8.2.3.) 
r s s x s y 

. ,he sums of squares : 

3 ' V duct-moment r can ala 

Th ‘ fusing the sum of products as the 
comP ° te r and the sums of squares, vi 

and UY - Y) 1 - m *e denommator. 


parametric (population^ corrdcvia»«^ 
(n) amounts to aero, . resulted 

of the computed r from O^h Bccausc 

mere chances of rando n _ ca | sampling 
has a bilaterally (Fig- ,he 

distribution when p . j nto the t score 

be computed r con be irons o error (.s,) 

for interpretation, using shou ul bc la ken 
Of r. The df Of the compu. . Qf raw sc ores 


ax 


Sv = 


or, r = 


JS£z*£ ; 

= ’ (fl-l)^K 

vtx-x)(T-n 

J2 


-.fig 


yf 


r. The df of the compuu- of raw scor es 
as ri - 2, because any two P- or(Jcr lo keep 

lose their freedom for cha < s of lhe 

X and Y c ° n , s, “ n ‘ “ S ns the computation 
respective population means 

of r. 

df=n- 2. 


mslZxZ*xYl 

n-1 ” -1 


BJT-IfiCr-rf Whose critical XHnhe imputed , 

( See « ^computed ^ , score for el , her the 

4. Using the raw scores . f 0 0 5, the H 0 cannot be 

TtXuTlrkS a om £?Z raw - —ed r is not si g niF,cant. 

“resTx and Y) of the variables. So, r can 2 r „, of H 0 that p has a given value other 

scoreS 1 •• *- than o _ Fisher’s z transformation . 

The H n contends here that p has a specified 
value _ other than zero — and that any 
observed deviation of the computed r from this 
given value has resulted from chances of 
random sampling. Student’s t test cannot be 
used here because r has asymmetric and 


1-r* . t = j- 

The computed r score is 
mo-tad test with t « ff t leve i s of 

with the same df. but for u idcred 


scores (X and Y) or me va. -- 

also be worked out directly from the ra 

scores. (See Example 8.2.2.) 

YXY IXI Y 

Cov (X.f) = A---T- 





Cov (X,f) 


s x s r 


r»ixy-ixif _ 

^[dx 2 -(ix) 2 ][df 2 -(i y ) 2 ] 


Significance of r 

1. Test of H q that p is zero : 

In this case, the // 0 contends that the 



Scanned by CamScanner 




















I4H 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


skewed sampling distributions if p * 0. In such 
enscs, the computed r is logarithmically 
transformed into Fisher’s z r which has 
bilaterally symmetrical and approximately 
normal sampling distributions around its 
parameter £ irrespective of the values of p and 
C (Hg. H.2). So, the computed r and the given 
p uie logarithmically transformed into z r and 
respectively. The deviation of z r from £ is 
converted to the standard z score which is 
referred to the unit normal curve areas for 


working out the probability (Pi of the /; 
correct. 0 S 


z r = 1.1513 log [■}—]; 
£= 1.1513 log[|r|]; 



s = = (z r - : 


P -2 (0.5000 - (area of unit normal curve fa, 
its p to the computed j)]. 


mu 


1 

281 

6.55 


2 

246 

7.10 


3 

369 

9.00 


4 

330 

8.50 


5 

258 

6.00 


6 

315 

7.00 


7 

246 

6.50 


8 

330 

9.00 


9 

298 

7.50 


Example H.2.1. 

1 11 is • significant correlation between O, option (ml mi 

pulmonary minute ventilation (litres per minute) using the following data. 

Individuals 

Oj consumption 

Ventilation 

Solution : 

1. First method : 

andUbii'scdSDs ^d^u'^Tabk 8d. ' XDUhl ' 0a {Y) “* used lo compute their means (X and Y 
- T^ble 8.1, Table for computing r using unbiased SDs. 

_ x y x-x 


281 

246 

369 

330 

258 

315 

246 

330 

298 


6.55 

7.10 

9.00 

8.50 
6.00 
7.00 

6.50 
9.00 

7.50 


r 2673 


-16 
-51 
+ 72 
+ 33 
-39 
+ 18 
-51 
+ 33 
+ 


r - y <x-x? (Y-Y? 


(X-X )(Y - Y ) 


1 


-0.91 

- 0.36 
+ 1.54 
+ 1.04 

- 1.46 

- 0.46 

- 0.96 
+ 1.54 
+ 0.04 


256 

2601 

5184 

1089 

1521 

324 

2601 

1089 

1 


0.8281 

0.1296 

2.3716 

1.0816 

2.1316 

0.2116 

0.9216 

2.3716 

0.0016 


+ 14.56 
+ 18.36 
+ 110.88 
+ 34.32 
+ 56.94 
- 8.28 
+ 48.96 
+ 50.82 
+ 0.04 


67.15 


14666 


10.0489 


+ 326.60 


X =M 
n 


2673 


= 297.0 ml ; 


TT = = 7.46 L ; 

n 9 


a. ^ 


Scanned by CamScanner 



































CORRELATION AND REGRESSION 


149 


■pgp-l 


I 14666 
9-1 


= 42.82 ml ; 


10.0489 

9-1 


= 1.12 L. 


T . products of (X- X ) and (Y — Y ) of all individuals are totalled to give I(X- X )(Y~ 1 )» 

(b) l to326.60. (Table 8.1.) 

am° unt ng a t n 

. jj jjien computed as follows . 

(C) ' 

I(X-X)(K-f) _ 326.60 

r (n-l)j x s y (9-l)x 42.82x1.12 + °' 85 ' 


2 . Alternative method : 

„ rYn , I(X-F)(y- F) = 326,60 Cov (X.f) _ 

Cov(X,Y)- 9-, * r ~ SvS „ ~ 


40.83 


— + 0.85. 


(n-1) 9-1 ’ ’ s x s Y " 42.82x1.12 

Student’s t is computed from the product-moment r worked out by any of the above two methods. 


s .= te'« 9-2 - 0199 - 


0.85 

1 = X = 0.199 


4.271 ; 


df=n-2 = 1. 


Two-tail critical t scores (<// = 7) arc quoted from Tabic B of Appendix. 

* 05 ( 7 ) = 2.365 ; f 02(7) = 2.998 ; = 3.499 ; r .ooi< 7 ) = 5.405. 

As the computed t is higher than the critical r 0| , but lower than the critical r^,, the probability P of 
getting the computed r by chances of random sampling is le^ than 0.01, but higher than 0.001 (0.01 > P > 
0.001). So, the variables have a significant correlation below the 0.01 level (P < 0.01). 


Example 8.2.2. 

Compute Pearson’s r using either (i) the covariance or (if) the raw scores to find whether or not there is 
a significant correlation between height (cm) and weight (kg) in the following data from 9 college students. 

Student :123456789 
Height (X) : 165 182 170 162 160 165 170 170 165 

height (f) ; 58 5 60 0 52>0 48.5 49.5 59.0 49.0 56.0 58.0 

Solution : 

first method using covariance : 

() The X and K scores are totalled separately to give IX and I Y (Table 8.2). 

^PectivT* 1 SC ° re is s 9 uare d and the squared scores of each variable are totalled to give IX 2 and I Y 2 

“ Ve, y (Table 8.2). 

Paired X and Y scores of each individual are multiplied with each other and all these products 
t0 IXT (Table 8.2). 



Scanned by CamScanner 





















150 


STATISTICS IN BIObOClY AN I» PSYt'IIOI n<IV 


(cO The SDs of A' and Y scores as well as Cov (AM') aie compiled 


Jf . IsiL.^LLJ . . 4 . 41 . 

ZXY IT _ _ 13WX490J _ n <a 

Cov (x.n - ^r 1 - - cm 


/I fl 

(e) Pearson’s r is computed using Cov (X.Y), S x and .r r 

Cov(X.Y) _ H-36 __ . 

r ,-~wT~ 6.09x4.# 


+ 0.43. 


2. Alternative method using raw scores directly ' 

The computations of IX. IK. IX 2 . IK 2 and IXK arc done ns in steps (o) to (c) given In (lie ll.M method 
(see above and Table 8.2). 


Table 8.2. Thble for computing r from covariance and also Irom raw scores. 


X 

Y 

X 2 

K 2 

XK 


165 

58.5 

27225 

3422.25 

9652.5 


182 

60.0 

33124 

3600.00 

10920.0 


170 

52.0 

28900 

2704.00 

8840.0 


162 

48.5 

26244 

2352.25 

7857.0 * 


160 

49.5 

25600 

2450.25 

7920.0 


165 

59.0 

27225 

3481.00 

9735.0 


170 

49.0 

28900 

2401.00 

8330.0 


170 

56.0 

28900 

3136.00 

9520.0 


165 

58.0 

27225 

3364.00 

9570.0 


Total 1509 

490.5 

253343 

26910.75 

82344.5 



«IXK -YXIK 


^[«Z^-(rX) J ][/iIY J -(in 3 ] 

9 x 82344.5 - 1509 X 490,5 _ 

^[9 x 253343 - (I509) 2 J9 x 26910.75 - (490.5) 2 ] 


+ 0.43. 


Significance of computed r : 


/ l-r 2 /1 — (0.43) 2 

‘' m 17=T B t— B °' 34,i 




0 43 
0.341 


1.261 ; 


(if m n - 2 - 7. 



Scanned by CamScanner 
































C0RRUI.ATION AND RBORHSSION 


151 


} 


, to Uible B of Appendix, il Is soon that even (lie (wo lull critical („„„ of 2.363 exceeds (lie 
On r« fcrr ‘ |lt! t |, cr e is no significant correlation between (he variables (/’ > 0.03), 
iicd t. So- 


CO 


ifl'P" 


8.23- 


or no t there is a significant correlation between vocabulary (cs( scores and typewriting test 


l 

8 

29 


2 

22 

48 


3 

33 

55 


4 

19 

49 


5 

23 

53 


6 

13 

41 


7 

2 

22 


8 

14 

38 


9 

11 

35 


10 

25 

48 


SCO** 

Student 

VocsbuW s core 
Typewriting score 

Solute : 

, method using the sums of squares : 

The means of vocabulary (X) and typewriting (V) scores, the squared deviations, viz., (X - X )‘ and 
V 2 0 f ,| 1CSC scores from the respective means, and the products (X - X )(Y - V) of these deviations 
* ’ „„irLi , [l out (Table 8.3). 


of scores nre worked oul (Urble 8.3). 

7-ll-'Jl 

A " n _ 10 


17.2 ; 


F.1L.♦W.41A 

n It) 


Table 8.3. Table for computing r using the sums of squares. 


X 

Y 

X- X 

Y-Y 

(X-X? 

(Y-Y) 2 

(X - X)(Y. - Y ) 

8 

"~29 

- 9.2 

-12.8 

84.64 

163.84 

+ 117.76 

22 

48 

+ 4.8 

+ 6.2 

23.04 

38.44 

+ 29.76 

35 

55 

+ 17.8 

+ 13.2 

316.84 

174.24 

+234.96 

19 

49 

+ 1.8 

+ 7.2 

3.24 

51.84 

+ 12.96 

23 

53 

+ 5.8 

+ 11.2 

33.64 

125.44 

+ 64.96 

13 

41 

- 4.2 

- 0.8 

17.64 

0.64 

+ 3.36 

2 

22 

-15.2 

-19.8 

231.04 

392.04 

+ 300.96 

14 

38 

- 3.2 

- 3.8 

10.24 

14.44 

+ 12.16 

11 

35 

- 6.2 

- 6.8 

38.44 

46.24 

+ 42.16 

25 

48 

+ 7.8 

+ 6.2 

60.84 

38.44 

+ 48.36 

1172 

418 



819.60 

1045.60 

+ 867.40 


(b) r is computed using the sum of the products of deviations, and the sums of squares (sums of the 
squared deviations) of variables. 


xcx-xxr-n 

yfc(X-X) 2 Z(Y-Y) 2 


867.40 

V819.60 x 1045.60 


+ 0.94. 


2- Alternative method using covariance ‘ 

Alternatively Cov (X,Y), s x and s y may be computed and used in computing r. 





1045.60 

10-1 


10.78 ; 


Scanned by CamScanner 





















152 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Cov (. X,Y) = 


S(Y-X)(K-n _ 867,40 = %Jg . 


(n-1) 


10-1 


Cov (X.Y) _ 96.38 _ 

r= —9.54x10.78 


= + 0.94. 


Student’s t is computed from the product-moment 
H 0 of no correlation. 


r worked out by either of the two methods, to ^ 


, r - g - - 0 - 12 . t -i = ^t= 7769; ^'■- 2 ='°- 2 = ». 

On consulting Table B of Appendix, the computed r is significant correlation"^ 

t m „„ which amounts to 5.041. So. the H, is rejected - the vanables IP 5 


*.001(8) 

0 . 001 ) 


Example 8.2.4. 

Find whether or no. there is a significant conelanon between the following gill weights (mg, and bod, 
weights (g) of a sample of 10 crabs. 


Individuals 
Gill weights 
Body weights 

Solution : 


1 

70 

3.90 


2 

90 

4.82 


3456 789 10 

120 160 200 220 220 232 300 310 
13.60 14.40 14.82 15.20 15.40 16.11 14.92 16.72 


1. First method using raw scores : 

(a) The gill weight (X mg) and the body weight (Y g) scores arc totalled to give IX and IK, respectively 
(Table 8.4). 

(b) Each score is squared and the squared scores of each variable are totalled to give ZX 2 and IK 2 , 
respectively (Table 8.4). 

(c) The paired X and K scores of each individual are multiplied with each other to give XY and all these 
products are totalled give 1XY (Table 8.4). 

Table 8.4. Table for computing r using raw scores. 


Individuals 


X 2 


K 2 


XY 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 


70 

90 

120 

160 

200 

220 

220 

232 

300 

310 


3.90 

4.82 

13.60 

14.40 
14.82 
15.20 

15.40 
16.11 
14.92 
16.72 


4900 

8100 

14400 

25600 

40000 

48400 

48400 

53824 

90000 

96100 


15.2100 

29.2324 

184.9600 

207.3600 

219.6324 

231.0400 

237.1600 

259.5321 

222.6064 

279.5584 


273.00 

433.80 

1632.00 

2304.00 

2964.00 

3344.00 

3388.00 

3737.52 

4476.00 



Scanned by CamScanner 






















CORRELATION AND REGRESSION 


153 


"ZXY-IX ZY _ 


10x27735,52-1922x120.89 _ 

^[l 0 x 429724 - < 1922r][| 0 x 1886.2917 - (129.89) : ] 


+ 0.80. 


2 Alternative method using sums of squares : 

(a) The means of gill weight (Xj and body weight (K) scores, the squared deviations of these scores from 
tjje respc ct ' ve ro eans > an ^ the product of these deviations of scores for each individual are worked out 
(Table 8i)- 


X 


IX _ 1922 _ 

— ' ~W s 1922 m 8 ; 


y _ IK 129.89 

n 10 


12.99 g. 


(b) The squared deviations of scores from their respective means arc totalled to give the respective sums 
of squares, viz., I(X- X ) 2 and Z(Y- Y ) 2 . From Table 8.5 : 

Z(X-X)* = 60315.60 mg 1 ; Z(Y-Y ) 2 = 193.1505 g 2 . 

(c) The products of the deviations of scores of the two variables from their respective means are totalled 
for all the individuals to give the sum of products, viz., I(X -X XK-F). From Table 8.5 : 

K X-XM-Y)u 2770.662. 

I(X-XHK-F) 2770662 

r ~ ~ T~. ■ ■ ) _ a + 0.81. 

Vl(X-Xri(K-Fr V60315 60x193.1505 


Table 8.5. Table for computing r using sums of squares. 


X 

Y 

X-X 

(X-X) 2 

Y-Y 

(Y-Y? 

(X- X)(Y - Y) 

70 

3.90 

-122.2 

14932.84 

- 9.09 

82.6281 

+ 1110.798 

90 

4.82 

- 102.2 

10444.84 

- 8.17 

66.7489 

+ 834.974 

120 

13.60 

- 72.2 

5212.84 

+ 0.61 

0.3721 

- 44.042 

160 

14.40 

- 32.2 

1036.84 

+ 1.41 

1.9881 

- 45.402 

200 

14.83 

+ 7.8 

60.84 

+ 1.83 

3.3489 

+ 14.274 

220 

15.20 

+ 27.8 

772.84 

+ 2.21 

4.8841 

+ 61.438 

220 

15.40 

+ 27.8 

772.84 

+ 2.41 

5.8081 

+ 66.998 

232 

16.11 

+ 39.8 

1584.04 

+ 3.12 

9.7344 

+ 124.176 

300 

14.92 

+ 107.8 

11620.84 

+ 1.93 

3.7249 

+ 208.054 

310 

16.72 

+ 117.8 

13876.84 

+ 3.73 

13.9129 

+ 439.394 

1 1922 

129.89 


60315.60 


193.1505 

+ 2770.662 


To test the // that there is no significant correlation, r is worked out by either of the two methods and 

coo *ned to Student’s /. 


20 


Scanned by CamScanner 





























IVI 


.STATISTICS IN BIOLOGY AND PSYCHOLOGY 



■ 0.207 ; I m ^ m - 3.913 ; df • n - 2 * 10 - 2 a ft 


Using Ihhle ll of Appendix, the computed / in found to be lower than the critical fjoim of 5,(M|. ^ ( 
lx* higher tlmn the critical / 0)(<) of 3.335. So, the probability /’ of correctness of the // 0 is higher than o.ooi° 
but lower than 0.01. because /’ < 0.01. the // 0 in rejected and it is inferred that there is a signify 
conflation between gill weights and Ixxly weights. 


lixample H.2,5. 

(I) bind whether or not there is a significant correlation between the IQ values of the following |q 
students and the marks obtained by them in Mathematics in a school examination 

Individuals :1 234567 8 9 10 

IQ values : 93 102 112 96 134 123 100 108 110 88 

Maths marks : 20 43 43 30 65 60 46 50 57 24 

. 1 who'Her or not the correlation coefficient obtained in tin .. , mt | v | lom 

parametric correlation coefficient of ♦ 0.60. 

Solution : 




(b) For each individual, the product of (X - X ) and (Y V\ ■ 
(outlied ,o give .he s „.„ of products, vi,. I ( * . T/y 

HX-XXY-Y)= 1718. 

... r= EUf-Fw-n 



indjr - Y ) is worked 
“ Y )• Prom Table 8.6. 


I “7 1 O 


out. All such products a/e 


Scanned by CamScanner 











CORRELATION AND REGRESSION 
8 6. Table for* computing r using sums of squares 


155 



95 
102 
112 

96 
134 
125 
100 
108 
110 


43 

45 
30 
65 
60 

46 
50 
57 
24 


+ 5 
-11 
+ 27 
+ 18 
- 7 
+ 1 
+ 3 
-19 


25 

121 

729 

324 

49 

1 

9 

361 


+ 1 
-14 
+ 21 
+ 16 
+ 2 
+ 6 
+ 13 
-20 


1 

196 

441 

256 

4 

36 

169 

400 


+ 5 

+ 154 
+ 567 
+ 288 
_ 14 

+ 6 
+ 39 
+ 380 



He computed r is converted to Student’s t. using the SE CO of r. 

0.89 


IlV _ fl-(0-89)1 _ o i6i ; 

4, > 10-2 


-=-rS = 5 - 528 


df = ,i-2 = 10-2 = 8. 


i _ f Ann „„j ir ,he comnuted / is found to be higher than even the critical r 001(8) 5.04\. 

« Ability IP) of Z « 0 being entree, is too low - lower than 0.001. Hence, the W„ is rejected and 

rAsrr 1 1 ^ ..—.*< p < 

IZiZ Spoilt: niputcd r is no, significantly different front the proposed 
parametric correlation coefficient ip) amounting to 0.60. 

H 0 : p = + 0.60 ; H a : p * 0.60. 

In such a case. Fisher's z transformation has to be used in working out the probability <P) of the //„ 
being correct (pages 141-142). 

(u) The computed r and die proposed p are both logarithmically transformed respectively into z, and f. 

r 1+r i ru-089" 

z r = 1.1513 logkT7J= U513 1o 8|_1-0.89 

fi+pl f i+Q - 60 

£ = 1.1513 log|j7-^J= 11513 lo §|_l-0.60 

(6) The difference between z and £ is converted to the standard z score, using the SE of the difference 
{ V- probability (/>) of the computed z occurring by chance is then worked out using the unit normal 

CUfVe tal)le (Table A of Appendix). 


= 1.422. 


= 0.693. 


j. = -jA 

r I 


= 0.3780 ; 


z = 


Zr-( _ 1.422 - 0.693 = , ^ 


S- 

w 


0.3780 


Vl0-3 

P = 2 [0.5000 - (area of unit normal curve from its p to the computer} z)] 
= 2 [0.5000 - 0.4732) = 0.054. 


Scanned by CamScanner 
















156 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Because the P of correctness of the H 0 exceeds 0.05, P is considered too high and the H 0 is retained 
is, therefore, inferred that the computed r does not differ significantly from the proposed /> of + r. ^ 
(P > 0.05). U ^ 


Computation from grouped data 

A code method is used in computing r from 
a bivariate frequency distribution of a large 
sample. 

Bivariate frequency distribution : 

It is the frequency distribution of the paired 
scores of two variables for the individuals of a 
sample. It may be either arranged in a two-way 
table as follows, or represented graphically as a 
scattergram (pages 33-34), or drawn as a three- 
dimensional bell-shaped/elliptical mound-like 
distribution (Fig. 8.3.). 

(a) Each variable is divided into a number 
of classes as in the case of univariate frequency 
distributions (pages 15-16). 

(b) The classes of the variable Y arc 
arranged in a descending order from top to 
bottom in the rows of a table while those of 
the variable A are arranged in an ascending 
order from left to right in the columns (Table 
8.7). A two-way table results with k and / 
numbers of classes for X and Y respectively, 
and kx l number of cells. Thus, the bottom 
left comer cell accommodates the frequency of 
individuals with scores of both variables falling 
in their respective lowest classes while the top 
right comer cell houses the frequency of those 
with scores in the highest classes of both 


row is entered as the marginal frequency 0 f ; /, 
row (fy) in the extreme right column of 
table. The f Y values in this column constiu/ 
the frequency distribution of Y scores (margin 
distribution ofY). 

(/) The sum of all cell frequencies of each 
column is entered as the marginal frequency 0 i 
the column (f x ) in the bottom row of the table 
The f x values in this row constitute the 
frequency distribution of X scores (marginal 
distribuiton of X). 

y = ^/x = n ' 

Code method for computing r : 

(a) The bivariate frequency distribution is 
used in a correlation table with its cell 
frequencies at the centres of the respective cells 
(Table 8.8). The marginal frequency (f ) 0 f 
each row and the marginal frequency (f x ) Q f 
each column are entered in respectively the 
marginal column and row for those values. 
Additional marginal columns and rows are 
added to the right of the f y column and below 
the f x row, respectively. 

(b) The midpoint of a centrally located class 
interval of each variable is chosen as its 
assumed mean (A) and given a code number 
of 0 (Example 8.2.6). 


(c) According to his scores in the ti 
variables, each individual is entered as a ta 
in the cell corresponding to the releva 

SSST" of class intervals of "» 

(d) The total number of tallies in each r 
entered at its centre as the cell frequency 

W ^ SUm 0f 311 “I' frequencies of ea 


Y 


f 




g 8.3. Bivariate normal frequency distribution- 


Scanned by CamScanner 









GORRH VV10N AND REGRESSION 


157 


UiNe 8 7. Tabulation of a bivariate frequency distribuiton. 




X variable : Age in years 

fy 


195-27.5 

27.5-55.5 

55.5-4.V5 

4 ' 5 - 51.5 

51.5-59.5 

59.5-67.5 

67.5-75.5 

T 

c 


"*7 







5 

V2J-V42 

• . 

■ 5 

* 5 

* 2 




II 

1 

XW-3U5 



~ 

4m 

*7“ 

4m 



10 

it 

V 

191-108 


^7“ 

1 ~T~ 

" s’ 

ir r~ 

1 . “ 


15 

a 

X % 

174-2.91 


% 

* 5 

UB 

5 

i 7 — 

" 5 _ 

T ~T~ 

18 

-O 

v 

2.57-174 




T~ 

2 

ir 

4 

* 5 

T T~ 

12 

* 

> 

2.40-2.57 


% 



1 2 

» 2 

" 5 ~~ 

9 


~7x 

6 

II 

10 

17 

16 

IJ 

7 

To" 

(/>) 





(c) Code numbers {x* and y ) are assigned 
to the midpoints of other intervals of X and Y 
variables, respectively, according to the orders 
and the directions (signs) of the deviations of 
those midpoints from the respective A values. 
For example, the class interval of X just below 
that of its A (x = 0) bears the x' number ot 
-1; the interval of Y just higher than that of its 
A (y = 0) is given the / number of +1. These 
code numbers are entered in the marginal row 
for x' and the marginal column tor v 
respectively. 

(d) The f x of each column is multiplied by 
the x' of that column to give Jx w hich is again 
multiplied by x' to yield fx ' 2 of that column. 
The sums of fx ' and fx 2 values of all the 
columns give respectively Yfx' and Yf\ • 


( f ) The means (x' and y') and the SDs (s x 
and s y ') of the coded values of X and Y are 
then computed. 




5 ' = UriLfxt-CLfrt ; 

* n 

s' = 
y n 


(g) The product of x ' and / of each cell is 
tered at its top left corner. Each such x y 
lue is multiplied by the frequency of that 
11, this product is entered at its bottom right 
rner, and such products of all the cells in 
ch row' (or column) are added to give 
(or Uy' cot J- AH entries of (or 

1 are added to give Ex y . 


"row 


r r 


(e) The f Y and y' values of each row are 
similarly used in computing fy * fy • f^ 

and L/y' : values. 


r = 


Yx'y'-nJ'y' 

ns'xK 


Scanned by CamScanner 

























158 



s'OJjjj m vapui DB!PJB3 : a|qeuBA \ 


Scanned by CamScanner 


-194 

OxV) 




































CORRELATION AND REGRESSION 


159 


V0& 82 6 ' 

Use the bi'ari^ Table 8.7 to compute r between age (yrs) and cardiac index 

^litres)- lo tcI P ret e o proposing a population correlation coefficient of -0.85. 

Solution ■ 


The 

method 


bivariate frequency distribution of Table 8.7 is rearranged in Table 8.8 for computing r by the code 


(a) The midpoint of the “j*™ 1 43 - 5 *1.5 ,s chosen as the assumed mean A of X (age) and given the 

ibeT* ofo. 2 91 ' 3 08 u ch ““ “r'ofy (cardiac index) and given the code 

(b) The midpoints of other intervals of X and Y m .k„„ 

according to their deviations from the respective assumed HZ!7a". * " nd » codc " UmbcrS 

(c) Using the f x and / values of each column of Table 8 8 fr ' ,nt « n M 
11k/, nnd / of each row are similarfy used m computing i 

W The means (7 am: ,d the SD, (.; ^ ^ ^ , am) y 


M..U 

n “ 80 “ 0 ' ^ * 


r»I^ 8 z22 a . 028 . 

n 80 u,z ° • 


,Jf “ “(£/«')* ■ ^/sOxIW-dJ) 1 > j.721 ; 

s y “ ■<l/v'r ■ ^>/*0x 24b-(-22> J a 1.732. 

(e) The product of / and v' of each ,.,.11 .* a 

rotaei sell of [he tabll , . \ l!???. * *** ** «”« ! fci the bottom right 

. . .• bottom right comer of the ceiw*_ fcr ** " 5 ft ^? DCTCy * nd P..I 

W * 5 » <- 9) - - 4S. Tire last-mentioned£ *ZTTT**. ,Ws Value «Hncs to : 

gi ve 2x'y* (or LvV ). All LvV (or ZrV ^ C ,0r a cells in each row (° r column) to 

_„ 4 » - column* All Lx >ro , (or Zx >alim) values are Mailed to give 2V/ whrch amounts ,o 

W r is (hen computed using the values of Table 8.8. 

r - Ixy-nry _ -194-80x0.16x(-0.28t 

ns xK 80'xL 72lx 1.732- = “ °- 799 - 

To tCSt thp fj fL at 

' ind f> ttre subjected^ * e . C °? P “ ted r “ “• ^cantiy different from the proposed p of 0 *s s ,s 
C ° mened >° standard ^ * ,ran ^ marim f respectively. The computed ^ t I^en 



^ = 1.1513 


C = 1.1513 


' 1 - 0.7991 
1 + 0.799 \~~ 109 6 * 


lo 8^7^f] = 1-1513 log 
log[|^]=U513 1og[|^|]_ 


1.256. 




Scanned by CamScanner 















160 


STATISTICS IN BIOLOGY ANO PSYCHOLOGY 



I 

^80-3 


0.1140 ; 



-1.096 +1.256 

0 1140 


1.40. 


Table A of unit normal curve areas is then used to find the two-tail probability P for the // i. 
correct. 0 ' n l 


P = 2 [0.5000 - (area of unit normal curve from its // to the computed .: ol 1.40)] 

= 2 [0.5000 - 0.4192) = 0.16. 

As P > 0.05, it is considered too high and the H 0 cannot be rejected. So, the computed r does not jw? 
significantly from the given p of -0.85 (P > 0.05). 


Significance of difference between r 
scores 

Fisher's z transformation (pages 141-142) is 
used to find whether or not there is a 
significant difference between the product 
moment correlation coefficients (r, and r,) of 
two given variables (X, and X 2 ) in two 
different samples, each consisting of more than 
50 cases (n, > 50, n 2 > 50). The H 0 contends 
that there is no significant difference between 
the two r values; in other words, // 0 proposes 
that the parametric correlation coefficients, /? t 
and p v of which r, and r 2 may serve as the 
respective estimates, are identical. 


H n 


: P\ = Pi \ 


H. 


P\ * Pv 


To work out the probability (P) of this H 
being correc. each of the sample r values is 
logarithmically convened to the corresponding 
J, and the difference (t,,-* ) between the two 
z r values is converted to the standard z score 
using the SE ( % ) of the difference between 


Example 8.2.7. 


z fJ = 1.1513 log 


* 1.1513 log 


I + r 2 

l-r? 


l/n/T 


n? -3 



The probability (P) of correctness of the II 
is obtained using the fractional area of the unit 
normal curve (Table A of Appendix). 

P = 2 [0.5000 - (area of unit normal curve from 
its p to the computed z)]. 

If ihe P thus worked out is found not to 
exceed 0.05 or any other chosen significance 
level, P is considered too low ; the // 0 is then 
rejected and the two computed r values are 
considered to differ significantly (P ^ a). But 
if the computed P exceeds 0.05 or any other 
chosen a, P is considered too high, the H 0 
cannot e rejected, and the two r values are 
considered not to differ significantly (P > a). 


The product-moment r between tnmtr . 

Iheth ' n ,W ° “ mples of respectively 57 and° 6 o\o n ckrMcLs e ' > 8 lhS i am0 ' ln,ed res P ec,ive| y to +0.63 aid 
Whether 0 , no, there is a s.gnifican, difference between 1" S ™ pled from ,wo different habitats. Find 
Solution : * W ° Samplc ' values (a = 0 . 01 ). 

Sample 1 : r, = + 0.63 ; „ = 57 . 

Sample 2 : r, = + 0.52 ; n, = 60. 


Scanned by CamScanner 













CORRELATION AND REGRESSION 


161 


j r nre converted to the respective z r values ; the SE (j. ) of the difference between the z r 
Both r \ " K v ,*,ked out and used in transforming the difference between the z values to the standard z 

^efficients 


CO 
scoff' 


z f( = 1.1513 log 

ri+'ii 

1 

'1 + 0.63] 

L‘- r »J 

= 1.1513 log 

1 -0.63 J 


1 + r 2 " 

1 

[ 1 + 0.52] 

z , 2 = 1.1513 log 

!-r. 

= 1.1513 log 

[l-0.52 J 


h r ^,i,-3 + n 2 -3 yfsT- 


3 60-3 


= 0.1899 ; 


z - 


Zr, "Zr, 
S. 


0.741-0.576 

0.1899 


0.87. 


The probability (P) of the H 0 being correct is worked out using the unit normal curve table (Table A of 

Appendix). 

p - 2 [0.5000 - (area of unit normal curve from its p to the computed z)] 

= 2 [0.5000 - 0.3078] = 0.38. 

As the computed P exceeds the chosen a of 0.01, it is considered too high. So, the H Q is retained, and 
it is inferred that there is no significant difference between the r values of the two samples (P > 0.01). 


Example 8.2.8. 

The product-moment r values between the intelligence test scores and the numerical reasoning test scores 
amounted respectively to +0.65 and +0.50 in two samples of 83 and 54 students, respectively, from two 
different educational institutions. Is there a significant difference between the r values of the two samples ? 

Solution : 

Sample 1 : P x - + 0.65 ; ;t, = 83. 

Sample 2 : r 2 = + 0.50 ; n 2 = 54. 

Both r, and r, are converted to z r and the difference between the computed z r values is changed into the 
standard z score, using the SE (s .) of their differences. The computed standard z score is then used to work 
out the probability (P) of correctness of the H 0 of no difference. 


Z f] = 1.1513 log 

" 1 + r i" 

= 1.1513 log | 

'1 + 0.65' 

_i-n. 

[1-0.65 


1 + 

I 

r 1+0.50' 

Zr 2 = U513 log 

l-r 2 

= 1.1513 log 

[1-0.50 


= 0.775 ; 


= 0.549 ; 




Zr,-Zr, 0.775-0.549 


z = 


r. 


0.1792 


= 1 . 26 . 


21 


Scanned by CamScanner 







































162 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


P = 2 [0.5000 - (area of unit normal curve from its - to the compered ~V) 
= 2 [0.5000 - 03962] = 031. 


As the probability (P) of the H 0 being correct is higher ifon 0.05. is coasktered too iegfc. So. ^ ^ 
is retained, and it is inferred that there is no significant Terence Sftwees the - values of the 
(P > 0.05). 




8.3 PARTIAL CORRELATION 

Partial correlation, though a special form of 
correlation between two given variables, is a 
method of multivariate statistics involving more 
than two variables in a sample. The product- 
moment r, though intended to measure the 
simple correlation between two given variables, 
results partly from the effects of other variables 
on both of them. Partial correlation aims at 
eliminating (partialling ouri such effects of the 
other variables on both the given variables in 
common so as to get a more precise measure 
of the correlation directly between the given 
variables. It is that part of the product-moment 
r between two given variables, which remains 
after the elimination of the component of their 
association, arising from the effects of the other 
variables on both of them. There are different 
orders of partial correlation accor din g io the 
number of variables to be eliminated or 
parti ailed out ; the product-moment r between 
any two variables, involving no elimination of 
any other variable, is considered as the zero- 
order r. 

Assumptions 

For using partial r, it should be reasonable 
to assume that, 

(a) all the variables involved are continuous 
measurement variables ; 


individuals or cases in the sample ; 

(cf' there is a linear association ^ 

scores of each pair of variables. 

First-order partial r 

This is the correlation between 
variables (X, and A\V» partialling out anoU*j 
variable (X*) correlated w ith both of tfcea 
Thus, of the three correlated variables, the 
order partial r bolds one constant to remove * 
effect on the correLarioo between the other t»v 
For ex a mp l e . r l2J partial* out the variable X 
•o measure the correlation between X, and 
free from the effect of X,. Here. an> X, soar 
consists of two independent and unconelated 
components, one correlated with X, and tbf 
other independent of X, ; similar!), each V. 
score has one component correlated with Jfj 
and another component independent of tbf 
latter The zero-order r (r c ) between X, and X, 
is partly due to their respective components 
correlated with X 3 ; r Ji3 partials out or 
e l iminates these components associated with I, 
and measures the correlation between the 
residual components of X, and JL free tree 
the influence of X,. 

The partial r of any order is computed 
using the product-moment r and the lower 
orders of partial r. For a first-order partial r. 


(b) their scores have unimodal and fairly 
symmetrical distributions in the population, 
without marked skewness ; 

(c) the paired scores of each pair of 
variables in an individual or case are 
independent of such paired scores of all other 


where r 12 , and /y. are zero-order product* 
moment r values between respective!) A. x* 
X ? X i 211(1 X y “d and X y If. in a samp' 
of humans, is the correlation betwec : 



Scanned by CamScanner 










CORRELATION AND REGRESSION 


163 


r 


/V ) and venous return (X 2 ), r , 3 
H> c ““‘’’“lation between X, and blood 
> •£* and r 3 is the correlation between 
is s$u re (Xj)» j s partial r between cardiac 
I venous return, eliminating the 
o!tP ot f blood pressure ; similarly, if 
c ffec ,s 0 test scores (X,), anxiety test scores 
j(lt ellig ence e ^ ) ar e correlated with each 
ft) > “ d i . sample of students, r li3 is the partial 
other » n inte iiigence and anxiety scores, 
r betWCe _ n „ ^ effects of age ; again, r is 


^'^'first-order partial 

/«/ \ _ A Ire 


again, r ,^ 3 is 
between oxygen 


the 


first-oroci p— — 

ption (X,) and tracheal ventilation (X 2 ), 
e° osU ? out the effects of atmospheric sulfur 
pafUi ide concentration (X 3 ), in a sample of 
fcosts Partial r values range from -1.00 to 

+ 1 . 00 . 


computed partial r has resulted only due to 
chances associated with random sampling. To 
test the probability P of this H 0 being correct, 
the computed partial r is transformed into the t 
score and the latter is compared with critical / 
scores having the same df. For r,^ 3 . 


s r 


123 


■m 


-rtwl 

3 


t = 


r l2.3 

S f| 2 J 


df = n - 3. 

The computed partial r is considered 
significant only if the t score worked out from 
it either exceeds or equals the critical t score 
for a chosen level of significance, not higher 
than 0.05 (P ^ a). 


Other first-order partial r values may also 
be computed similarly. 

_ r l3~ r l2 r 23 . 

,3 ‘ 2 " ' 

* _ r 23~ r l2 r n 

For example, in one case cited above, r, w 
is the partial r between intelligence test scores 
(X,) and age (X 3 ), parttailing out anxiety test 
scores (X 2 ). Similarly, r , 3 , is the partial r 
between the rate of plant stem growth (X,) and 
the soil pH (X 3 ), eliminating the effects of soil 
moisture (X,), when the variables are correlated 
with each other. 

The null hypothesis ( H Q ) proposes that the 
Partial correlation between the given variables 
amounts to zero in the population, and that the 


Second-order partial r 

The second and still higher orders of partial 
r arc seldom used. The second-order partial r 
involves four inter-correlated variables and 
measures the correlation between two of them, 
partialling out the other two. For example, 
r ,2 34 correlates X, and X 2 , eliminating the 
effects of X 3 and X 4 . The significance of the 
computed r , 2M is tested by transforming it to t 
score and comparing the latter to critical t 
scores. 


r l2.3 _r l43 r 24.3 


1234 


Jd-'-fuKI- 


r 243) 


0f ’ r l234 “ 


r YLA ~ r \lA *23.4 




4u) 


Sr 


12* 77^ ; 


r 12J4 

t = -- ; df=n- 4. 

r l2J4 


Sample 8.3.1. 


In a 


test sc gt0UP ° f 153 students « die product-moment r values between intelligence test scores (X,), anxiety 
+ 0,17 c ^ ^ chronological age (X 3 ) were found to be as follows : r l2 = + 0.46, r , 3 = + 0.35, r , 3 = 
°mpute r 12J and find whether it is significant or not. 


Scanned by CamScanner 
















164 


STATISTICS IN BIOLOGY 


and psychology 


Solution : 


r 12 - r i3 r 23 


n 46-0.35x0.17 _ + o 434 ; 


r l23 ~ 


= V(l-0.35=KI-0.17*) 


t = 


r n3 



= 5.9000 ; 


(l- r ?2jV( n_3) 


|l-(0.434r 

15^0" 


df = n - 3 = 153 - 3 = 150 = “- 

„ c -74 t = 3.291 (Table B). The computed t b ft* 

For a two-tail test, critical t MM = 2-576. and cn _ooi < 

higher than the critical t for the 0.001 level. So. the partial r is stgmpca 


Example 8.3.2. 

In a sample of 63 athletes, the product-moment r values between the stroke votane of bean <*,). * 
venous return (X.) and the vascular peripheral resistance and to be as f edtows . r a * ♦ 0.66, 

r l3 = - 0.12, r 2i = - 0.25. Compute the partial r between stroke volume and venons letum. dimmatiag the 

effect of peripheral resistance, and test its significance 
Solution : 


r t2J “ 


r l2 ~ r l3 r 23 

^d-^Kl-rL) 


0 6$ - (-0.12) x (-0.25) 
J[l-(-0.12) : ][l-(-0.25r] 


+ 0.645 ; 


( = 


r 12.3 _ 0.645 

J(l-43)/(«-3) J l -(0-645 ) 2 

V 63-3 


= 6.538 ; J/ = n - 3 = 63 - 3 = 60. 


For a two-tail test, critical f 01(60) = 2.660, and critical r MU6t = 3.460 (Table B). The computed r is thus 
higher than the critical t for the 0.001 level of significance. So, the partial r is significant (P < 0.001/ 


8.4 MULTIPLE CORRELATION 

Multiple correlation comes under multi¬ 
variate statistics as it involves more than two 
variables in a sample. It is a measure of the 
relation between one variable (called the 
criterion ) and the weighted sum of two or 
more other variables (called the predictors). 
The multiple linear correlation coefficient ( R ) is 
a special form of the product-moment r and 
measures the magnitude of the linear 
relationship between a given variable and the 


weighted sum of two or more other variables. 

Assumptions 

For computing and applying R , it should be 
reasonable to assume that : 

(а) all the variables involved, both critefl 08 
and predictors, are continuous measuretne r - 
variables ; 

(б) their scores have unimodal and fai# 
symmetrical distributions in the populati'-*-- 
.without marked skewness ; 


Scanned by CamScanner 


















t ORRELATION AND REGRESSION 


165 


(hc paired scores of each pair of in the following cases : 
in an individual or case occur in the .. 

vafi« b,eS ran dom and independent of such ' " the P rodu ct-moment r 23 between the 


«> 


sl » m|>IC scor« s otber ‘ nd ‘ v ‘ dua * s or cases ; l ' N ° P rcd * ctors ^2 and ^ 3 ) * s low * 

. r k,h ^ the criterion has a high r n or r n with 

D there is a linear association between the either predictor ; “ * 

, of each pair of variables, 
scores ° 


Multiple correlation >vith three variables 

Such a multiple linear correlation coefficient 
) may be computed between a criterion 
rfable (X,) and a weighted sum of two 
V |tdictor variables (X, and X 3 ) using the beta 
coefficients, P 2 and p y P 2 and P 2 are those 
proportions of the total variance of the criterion 
y as are associated with the variances of 
respectively X 2 and \ y Where r, 2 , r, 3 and 
are the product-moment r values between X, 
and X 2 , X, and X 3 , and X, and X 3 respectively. 


A- 


**i2~ r i3 r 23 

1 “1*23 


Pi s 


r D~ r l2 r 2J 


»-'L 


# 1.23 = ^jPi r n + Pi r \i *. 


(*«) if r (2 is substantially decreased by the 
negative effect of the other predictor X 3 , called 
the suppression variable, because of a high r , 3 
and a negative, poor or zero r [y This happens 
when the suppression variable X 3 has a 
variance common with the other predictor X 2 , 
but no such common variance with the criterion 
X, ; there may still exist a positive r p because 
of a variance common to X, and X 0 , but it is 
partly suppressed by the negative weight of the 
other variance, common to X 2 and X 3 and 
absent in X,. ft, 23 attains a value higher than 
r , 2 by minimizing this negative weight of the 
suppression variable. 

Coefficients of multiple determination and 
non-determination : 

The coefficient of multiple determination 
m is a measure of that proportion of variance 
of the criterion which comes from the 
combined contribution of the predictors. In 
other words, it is a measure of that proportion 
of the variation of the criterion, which is 
determined by the variations of the predictors. 
ft 2 finds application in the analysis of multiple 
regression and in assessing the relative 
importance of multiple correlations of different 
magnitudes. For a multiple correlation between 
a criterion (X,) and two predictors (X 2 and X 3 ), 

^1.23 = Pl r n + P-S r \y 

R ** multi ple correlation is perfectly linear. That remaining proportion of the variance of 
^amounts to 1.00 exactly. If ft amounts to 0 , the criterion as is independent of the combined 
nonp multiple con- elation does not exist, but a contribution of the predictors, is given by the 
^gated^ C ° rrelation cannot be straightway coefficient of multiple non-determination (X 2 ) 

which equals 1-ft 2 . Thus, for a multiple 

^magnitude 

w hi ^ 


or, 


^1.23 ~ 


r f: +r i3“2r, 2 r,3rvj 




# 2,13 and K 3.12 are also computed similarly. 


^2.13 = 


i 


r \2 + r 23 “ ~ r \2 r 13 r 23 


l-rf 3 


j? _ | r f3 +r 23~- r 12 r 13 r 23 
*3.12 - -,1-;—i- 

1 - # T 2 


^ “1- * U. muiUJJlC 

of the computed R. „ tends cor,dat,on be,ween the criterion < x i> “d two 
> the product-moment r„ P redictors {X > and X 3>- 


t r J 2 

e criterion (X,) and a predictor (X,) 






Scanned by CamScanner 













166 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Interpretation : 

To test the H Q that the computed R is not 
significantly different from 0. R is converted to 
t which is compared with critical t scores 
(Table B of Appendix) for interpretation. 


F = 


R 2 xu (n-3) 




1 

In -3 


_ R_ 
' ~ Sr 


df = n - 3. 


Alternatively, the computed R is converted 
to the variance ratio ( F ) which is compared 
with critical F ratios (Table H of Appendix) for 
the combination of specified df x and df , (viz., 
degrees of freedom of respectively greater and 
lesser mean squares in the table). 


F = 


R~(n-g- 1) 


"A 2 ) ’ 

4f\ = 5 ; dfi = n - g - 1 ; 

where g is the number of predictors. For 

different multiple correlations with three 

variables, df x - g = 2 ; df 2 = n-g - 1 = n - 3 ; 

% 

R] , 23 («- 3 ) 


The computed R is significantly diff c .,_ 
from 0 at or below that level of signified 
which has a critical F respectively equal j 0 r \ 
lower than the computed F (P $ a). 

Multiple correlation with more than thr^ 
variables 

Multiple correlation with one criterion >r 
more than two predictors can also be computed 
using the beta coefficients. Where m is the i n( jj 
number of variables and g is the number of 
predictors. 


F = 


2d -*T.2>) 


F = 


u(n-3> 
2(1-/?;„) 


m - + 0* r \y*-‘-+P, 

F rn -s- ; 

*(!' ) 

df x -g\ df, = n-g-\. 


m r \ m I 



Example 8.4.1 

;r ::s“ -—— 

Solution : 

(n) Beta coefficients are computed from the r values between the ^ 

multiple correlation coefficient /?, , v “ variables, and used in computing the 

B _ r i2~ r i3 r 23 _ 0-65-(-0 .12) x (-Q.251 

2 1~ r 23 r~(-0.25 ) 2 = 0.661. 

n _ f t3 ~ r i2 / ~23 _ -0.12- 0.63 x (-0.25) 

3 l“ r 23 I - (-0.25)- = 0 045 - 

*1.23 = V^ r '2 + A'13 = a/O- 661 x 0.65 + 0.045 x (-0.12) = 0.651 
(W To test the H„ of no cotrelation. R ll3 is convened to , score which is compared with cridcal , scorns- 


Scanned by CamScanner 
















CORRELATION AND REGRESSION 


167 



1 


S R = 


4n -3 


J?yi = o : 65 L 

l ~ S R 0.129 ’ 


= 0.129. 

V63-3 

J/= n - 3 = 63 - 3 = 60. 


I scoff’ 
0erna- 


tail test, critical f,ooi(60) = 3-460 (Table B). As the computed t is higher than even this critical 
,he computed P,. 23 is significantly different from zero (P < 0.001). 


For a t*°- 


tively* R\ 23 is converted t0 F ratio which is compared with critical F ratios. 


f .«^o) = (a 65 ,ft 63 - 3) = 22 065 

2(l-/?t23) 2[l-(0.651)-] 


higher 


dfi = g = 2 ; d/ 2 = n - g - 1 = 63 - 2 - 1 = 60. 

From Table H of Appendix, critical F 05(260) = 3.15 ; critical F X)ia60) = 4.98. As the computed F is much 
than the critical F for 0.01 level of significance, R ljn is significantly different from zero (P « 0.01). 


Example 8.4.2. 

In a sample of 153 students, the product-moment r values between intelligence test scores (X,), anxiety 
test scores (X 2 ) and chronological age (X 3 ) were found to be as follows : r, 2 = + 0.46, r, 3 = + 0.35, r 23 = 
+ o 17. Compute and interpret the multiple correlation coefficient between intelligence test scores and the 
combination of anxiety test scores and age. 

Solution : 

23 + r u _ | (0.46) : -2x0,46x0,35x0.17 + (0.35) 2 _ 536 

“ “V 1-(0.I7) 2 


^1.23 “ 


i 


'l2- 2r l2 r l3'' 


i -rb 


F = 


^1.23 (u — 3) 

2(1 1,23) 


(0.536) 2 (153 — 3) 
2[l — (0.536) 2 ] 


= 30.233 ; df x = g = 2 ; 


df 2 = n - g - ! = 153 - 2 - 1 = 150. 


From Table H of Appendix , critical F 05ai50) = 3.06 ; critical F 0tai50) = 4.75. Since the computed F far 
exceeds the critical F for 0.01 level of significance, R XJ23 is significantly different from zero (P « 0.01). 


85 SPEARMAN’S RANK CORRELATION 

Spearman’s rank-difference correlation 
coefficient (rho or r p ) is a nonparametric 
counterpart of the product-moment r for simple 
lnear cor relation. It explores the magnitude 
direction of the linear relation between two 
fables in a sample when their values are 
pressed in ranks. The computed r p values 


range from -1.00 and +1.00, total absence of 
correlation being indicated by 0.00. In contrast 
to the product-moment r, it has fewer and 
simpler assumptions, and is computed more 
easily, but is less powerful. 

Assumptions 

For using Spearman’s rho , it should be 
reasonable to assume that f*‘ 




Scanned by CamScanner 



















16S 


STATISTICS EN BIOLOGY AND PSYCHOLOGY 




(a) there is a linear relationship between 
the v ariables to be correlated ; 

(b) the magnitude of each variable can be 
expressed in ranks — r, is thus applicable to 
ordinal variables and to continuous ratio and 
interval variables whose scores can be changed 
into ranks, but not to no min al variables which 
cannot be ranked ; 

(c) the paired ranks (scores) of the variables 
of each individual or case occur in the sample 
at random and independent of all other paired 
ranks (scores). 

As the assumptions for normal distributions 
and for continuous nature of the variables are 
not required, it can be applied to both 
continuous and discontinuous variables, 
distributed normally or non-normally, as also to 
small samples. 

v Computation 

m (a) Ranks are first assigned to the scores of 
f each variable separately, in either ascending or 
descending order for both variables. If two or 
more scores of a variable are identical, each of 
such tied scores is given an average rank 
which is the arithmetic mean of the ranks that 
those scores of the tied set would have got if 
they were successive scores instead q{ 

identical. The score next to a tied set is given 

ZrtTjT “ * 7 Uld haVe S»' ^ <he lied 
f the preceding set would have held 
separate successive ranks. a 

However, if the individuals of th 
were already ranked with JV?* 

'wo variables in similar orders a, °\ ' 

be used instead of fresh ranking^ ' ranks CM 

die paired scoreTof !wo ttriabl" ^ ^ ° f 
on. for each individual or ^ 

fh s< l u ared differences (D~) K 


orderly arrangement of the ranks of the 
and have bilaterally symmetric satnp^ ’ 
distributions. ^ 

(d) In one method, using the n numbe r 
iiis of observations in the sample. 


pairs 


. 61 b 2 

r = 1 --:-. 

p n(n“ -1) 


In an alternative method using ^ 

is the maximum possible disarray. 



Inaccuracies : 

(0 Inaccuracies of r p result from the 


use of 


average ranks for tied scores, instead of thei, 

true ranks. r 

(a) Inaccuracies also arise from unequal 
differences in magnitude between scores given 
successive ranks. For example, ranks 1. 2 ,3 
and 4 might have been assigned to scores 10 
17 and 25 respectively so that the 
differences between the scores bearing 
consecutive ranks vary widely, amounting to 2 

1 “d 2. 5 between ranks 2 and 
3 and 8 between ranks 3 and 4. Because r is 
computed from the ranks ignoring such 
dhffetences in magnitude between successive 

com™,; a j would differ from dm' of r 
computed from the same set of scores. 

Significance of rho 

( a ) Where n < 30 

compared directly witf, m T “ 
the given „ • ^ Wll “ entical r values for 

(Table D nf U . Sin * a la ^^ e °f critical r values 
A PPf_nd ix) . The comput ' d is 

« respectively !" f* ' eVe ‘ WhoSe critical 
computed one (P * ^ t0 ° r Iower than *** 

(b) Where n > in • 

undertaken to find .k 3 ‘ W °' tai1 ' tes ‘ ' 
correctness of the ^ \ pr ° babllity P °‘ 

is no sign if w ^ uc ^ contends that there 

6 1Cant correlation between the 


Scanned by CamScanner 








CORRELATION AND REGRESSION 


169 



Where s r is the SE of the computed 
variables- p 

r=i = r El. 
v %-r 

df = n - 2. 

XI,e computed r p is considered significant 
I if the / score computed from it exceeds or 
equals the critical r score for the chosen a 

(P $ <*)• 


(c) Where n ^ 25, the computed r p may be 
alternatively converted to the z score which is 
referred to unit normal curve areas (Table A) to 
find the P of the H 0 being correct. The r p is 
significant only if P is equal to or lower than 
either the chosen or or the a of 0.05. 



= _j£_ = 


In -1 ; 


P = 2 [0.5000 - (area of normal curve from its 
// to the computed ; score)]. 


Example 8.5.1. 


Compute r p between 

heights 

(cm) 

and weights 

(kg) of 

Interpret the result. 






Student : 1 

2 

3 

4 

5 

6 

Height : 165 

182 

170 

162 

160 

165 

Weight \ 58.5 

60.0 

52.0 

48.5 

49.5 

59.0 


Solution 


7 

170 

49.0 


Table 8.9. Tabic for computing r fi between height and weight 


8 

9 

10 

11 

12 

170 

165 

176 

167 

180 

56.0 

58.0 

60.0 

59.5 

66.5 

height 

and weight. 




Student 

Height (cm) 

Weight (kg) 

D = 

D 2 

Scores (X) 

Ranks (/?,) 

Scores (Y) 

Ranks (/?,) 

1 

165 

9 

58.5 

6 

+ 3 

9 

2 

182 

1 

60.0 

15 

- 1.5 

2.25 

3 

170 

5 

510 

9 

-4 

16 

4 

162 

11 

48.5 

12 

- 1 

1 

5 

160 

12 

49.5 

10 

+ 2 

4 

6 

165 

9 

59.0 

5 

+ 4 

16 

7 

170 

5 

49.0 

11 

-6 

36 

8 

170 

5 

56.0 

8 

-3 

9 

9 

165 

9 

58.0 

7 

+ 2 

4 

10 

176 

3 

60.0 

2.5 

+ 0.5 

0.25 

11 

167 

7 

59.5 

4 

+ 3 

9 

12 

180 

2 

66.5 

1 

+ 1 

1 

Total 






107.50 



1- First method : 

(o) Ranks are assigned in descending orders to the scores of height (X) and weight (Y) separately (Table 
• 9 >- Average ranks are given to all the scores of each tied set. Thus, each of three X scores of 170 cm, 
^Pected to occupy ranks 4, 5 and 6, gets the average rank of 5 : 


Scanned by CamScanner 




















l ft* STATISTICS IN BIOLOGY AND PSYCHOLOGY 

Sum of expected ranks _ 4 + 5 + 6 _ 5 
rank = Numhcroflie d sco^" ' 3 

Htc next low** Store of 167 consequently gets a rank of . 

. to onH R I of the X and Y scores of 

(b) I1k rank-dithcrencc (D) is worked out between the ranks (.«, - 

ptfir CHtNe 5L9V 

W l-:«ch »,Mta is squumd and the sum (ID 1 ) of all d- squared nnMiffercnces is »*, „ 
computing r, % . 


6 SP : _ j _ 6x107.50 _ + 0 62. 
1 “ n(n 2 -1) I2H44-D 


2. Alternative method : 


> 


(«) D .and D 1 for all the cases, as well as ZD*, are w 
(Thble 8.9). 

(b) ZOf )U , is computed from n. and used in working out r fi . 

£D 2 _ Olal ~ l} = »2((I2K-1) _ 572. 


.orked out as in the preceding method 


3 

p x^sL. 


Interfnvtation : 

lb find the probability (/>) of correctness of the //„ proposing no correlation, a two-tail / test is 
undertaken using the r p computed by any of the prcecvii.-.g methods 



f 


= 0 248 : 


r JL 


#= n - 2 = 12 - 2 = 10. 


0.62 

0.248 


2.500 ; 


TWo-tail critical t scores (df = 10) are quoted from Table B of Appendix : 


*, 01 ( 10 ) = 3-169 ; *. 02 <i 0 ) ~ 2.764 , { jko0) 2 . 228 . 

As the computed t is higher than the critical r^ the probability P of getting the computed r p by chances 
of random sampling is less than 0.05. So, the H 0 can be rejected and the variables have a significant 
correlation (P < 0.05). 


Example 8.5.2. 

Compute r p with the data of Example 8.2.3 and interpret the results. 

Solution : 

1. First method : 

(a) Ranks are assigned in descending orders and separately to the vocabulary test scores (X) and the 
typewriting test scores (k). arranged in pairs in Table 8.10. Average ranks are given to all the scores of each 
tied set. Thus, each of two }' scores of 48, expected to occupy ranks 4 and 5, gets the average rank of 4.5 
while the next lower score of 41 gets the rank of 6. 


■ 

Scanned by CamScanner 




















CORRELATION AND REGRESSION 

Tabic 8.10- Table for computing r p between vocabulary and typewriting scores. 
Vocabulary Typewriting 


171 


D = 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 


Scores (X) 

Ranks (/?j) 

Scores (T) 

Ranks (/?,) 

(*, - *. 

8~ 

9 

29 

9 

0 

22 

4 

48 

4.5 

-0.5 

35 

1 

55 

1 

0 

19 

5 

49 

3 

+ 2 

23 

3 

53 

2 

+ 1 

13 

7 

41 

6 

+ 1 

2 

10 

22 

10 

0 

14 

6 

38 

7 

- 1 

11 

8 

35 

8 

0 

25 

2 

48 

4.5 

-2.5 


D 2 


0 

0.25 

0 

4 

1 

1 

0 

1 

0 


Total___ 

The rank-difference (D) is worked out between the ranks (K, and R 2 ) of the X and Y scores of each 

^(c) The rank-differences arc squared and all the squared rank-d.{Terences are totalled to give the ID 2 of 
13.50. 

(d) r p is computed using lO^a* and J ' /> ’ 

„(/« 2 -l) _ 10(100-1) 


21D 2 _ 2x13.50 = 0 92 . 

= ^^ ^—- ■ 330 ; r - I - 330 


nmi 


2. AtemanV, : ihe d| m «hod 8.10). 

(a) £) and D 2 for all the cases, as well as 2.D*. are wonxu u 

(/,) r p is next computed using the ID 2 thus worked out. 

6ID 2 _ . _ 6xl3 - 50 — = + 0.92. 

r p sl ~ ' 10[(10) 2 -1] 

Interpretation : f the ^ve methods, for testing the H 0 of no 

Student’s t is computed from r p , worked ou 
correlation. 


J l-r-p 1 1 —(0-92)1 _ 0 139 ; 1 ~ ~ 

%=\-rf = V 10-2 


0,92 

0.139 


= 6.619 ; 


df = n - 2 = 10-2 = 8. 

., , ccores (<*/" = 8) from Table B. 

The computed t is compared with two-tail cn - 5 041 

_ ^ 896 ' foi(8) = 3,353 ; r ooi(8) " 5U41 ' 
r .05(8) " 2306 ' ' 02(8) . jhe computed r is significant beyond the 0.001 

As the computed t is higher than even the cntic: ) ^ bles ( /> < 0.001). 

'**• So. there is a significant correlation between the vanaDt 


Scanned by CamScanner 




















172 


STATISTICS IN UIOI.OOY AND PSYCHOLOGY 




H.6 KENDALL’S RANK CORRELATION 

Like Spearman's rlw, the rank correlation 
coefficient (tau or t), developed by M.G. 
Kendall, is a nonparamctrlc counterpart of 
Pearson's r lor simple linear correlation 
between two variables. Like r f) , but in contrast 
to Pearson’s r, tan is computed more easily, 
has (ewer and simpler assumptions, may be 
applied to small samples, to both normal and 
non-normal distributions, and to ordinal 
(ranked) variables as well as such continuous 
interval and ratio variables whose scores have 
been changed into ranks ; but it is less 
powerful than the product-moment r though 
somewhat more powerful titan Spearman’s rho. 
Values of tan range from -1.00 to +1.00, a 
value of 0.00 indicating the absence of 
correlation. The values of tan and rho have 
high and positive correlations with each other ; 
but (hey differ from each other for the same 
data because of (heir different scales. 

Assumptions 

It should be reasonable to assume that : 

(a) there is a linear relationship between 
the variables to be correlated ; 

(/>) the magnitudes of scores of each 
variable can be expressed in ranks — so, it is 
not applicable to nominal variables which 
cannot be ranked ; 

(c) the paired ranks (scores) of the variables 
for each individual occur at random and 
independent of all other paired ranks (scores) 
in the sample. 

Assumptions for continuous measurement 
variables and for the normal or near-normal 
disuibution of their scores are not required. 

Computation 

(a) Ranks are first assigned in an ascending 
order of magnitude to the scores of each 
variable separately, giving average ranks to the 
scores of each tied set as in the case of 


computing r p (Table 8.11). But in case ( ,f 
ordinal variables already in ranks in the data, 
those rank* arc directly used in computing r ’ 

(h) The ranks (/?,) of the variable win, , l0 
tied score arc serially arranged in an ascending 
order along a column and each such R | rank j$ 
paired in the adjoining column with the rank 
(/?,) of the other variable in the same 
individual (Table 8.12). If neither or each of 
the variables has tied scores, the ranks of any 
of them are arranged as /?, ranks in the ordered 
manner pairing them with the ranks (R 2 ) of the 
other variable in the respective individuals. 

(c) Starting from the top of the column of 
the paired R , ranks of the second variable, 
each /?, rank is taken in turn as the pivotal 
rank and compared with all subsequent ft, 
ranks in that column. Bach subsequent R, rank 
if counted as I, 0.5 or 0 according ns it is 
higher than, equal to (lied with) or lower than 
ll>c particular pivotal rank. 'Hie count of all the 
subsequent R : ranks for each pivotal rank is 
entered as (he C i value of the latter (Table 
8.12). After using all the R 2 ranks as pivotal 
ranks in turn and counting their respective C 
values, the latter values are totalled to give. 
ICf. 

(d) To correct the inaccuracies due to 
average ranks of tied scores, correction terms 
(IT, and Z T 2 ) are computed for the respective 
variables. In case of each variable, T is first 
computed for each set of tied scores of that 
variable, having t number of average ranks in 
that set : T — /(/ -1), The T values of all the 
tied sets of a variable are then totalled to give 
Z T for that variable. 

1T i = ![(,(/, - l)] ; 

= Z [t 2 (t 2 - 1)]* 

If, for example, a variable has two tied sets 
consisting of 2 and 3 average ranks 
respectively, IT = 2(2 - 1) + 3(3 - 1) = 8. V 
amounts to 0 if a variable has no tied score. 


Scanned by CamScanner 







CORRELATION AND REGRESSION 


173 


1 


a sample of size n , boih variables 
fold scores. 

^ 4lCi-n(«-l) 

ne ilher of the variables has any tied 


score’ 


r = 


4IC ; -n(w-1) 
/j(n-l) 


2 . Alternative method : 

Mter assigning ranks to the scores of the 
variables and arranging those ranks in pairs 
along two columns as given in steps (a) and 
.v 0 f the preceding method, an alternative 
method may be followed for the rest of the 
procedure. 

(c) Starting from the top of the column of 
the paired R 2 ranks, each l< 2 rank is in turn 
taken as the pivotal rank and compared with 
all subsequent R 2 ranks in that column. But 
each subsequent R 2 rank is counted here as +1, 
Oor-1, according as it is higher titan, equal to 
(tied with) or lower than the particular pivotal 
rank (Table 8.13). The algebraic sum of these 
counts for each pivotal rank gives the total 
count C for the latter. After using all the R 2 
ranks as pivotal ranks in turn, their C counts 
are added to give EC. 

( d) Correction terms (ETj and ET,) are 
computed for the tied scores of the respective 
variables in the same way as in the preceding 
method. 

(e) If both the variables have tied scores. 


r 


21C 

^(n-D-irJntn-D-irJ 


If neither of the variables has any tied 
score, 



r = 


2£C 

n(n-\) 


Inaccuracies : 

(a) All rank statistics suffer from an 
inaccuracy owing to the average ranks given to 
tied scores, instead of their true ranks ; this is, 
however, minimized for tau by the correction 
terms, ET, and ET,, in its computation. 

(b) Another inaccuracy results from the 
unequal differences between scores given 
successive ranks. 

Significance of tau 

(a) For small samples (n ^ 10), the 
computed r is compared with the critical r 
values for the given sample size, taken from a 
standard table . The computed r is significant at 
or below the level whose critical r is either 
equal to or lower than the computed r (P ^ a). 


( b ) In case of larger samples (n > 10), tau 
has symmetric and near-normal sampling 
distributions, enabling its conversion to 
Student’s t for testing the H () which contends 
that there is no significant correlation. 

l2(2/i + 5) 
s r = }j 9n(n - 1) ’ 


x _ / 9«(n-1) . ' 

1 ~ s T ~ y 2(2/i + 5) ’ 


df = oo. 


The computed r is significant at or below 
that level of significance whose critical t score 
is either equal to or lower than the computed t 
(P $ a). 


Instead of t, z score may also be computed 
from the tau, using the same formula as fpr t, 
and then interpreted with reference to unit 
normal curve areas (Table A of Appendix). 


Scanned by CamScanner 














174 


STATISTICS IN BIOLOGY AND WUIOI OOY 


Example 8.6.1. 

Compote tau between pulse rates and respiratory rales per minute, using ll» r following d«ia, Ui fi ( „| ^ 
there is a significant correlation between the variables. 


Individual 
Respiratory rate 
Pulse rate 


1 

2 

3 

4 

15 

16 

12 

21 

72 

72 

70 

82 


5 6 7 H 

17 13 18 II 

75 70 72 75 


9 10 II 12 

14 20 19 22 

f,H Hd 79 m 


Solution : 


1. First method : 

(a) Ranks are assigned separately and in ascending orders to the scores of the two van-iM* s hiring 
rank is given to each score of a tied set (Table 8.11), 

(b) Because respiratory rates show no tied score and no ■vcf;u ,r rank, its rank* (H^) »ue serially uoNii|rx/j 
in an ascending order in Table 8.12, pairing each R t with the rank R 7 of the pulse rate of tfi*; khii,, 
individual. 

(c> As the 7?, ranks include no average rank for tied scores, ZT t for R, amounts to : 17, • LI/,(/, \ t \ 

— 0. But the R 2 ranks show three tied sets, viz., two ranks of 7.5, two ranks of 2.5 and three ranks >,i | 

I T 2 = I(r 2 (r 2 - 1)1 - 2<2 - I) ♦ 2(2 - 1) ♦ 3(3 - 1) ■ 10. 

(d) Starting from the top of the column of paired ranks (/? 2 ), each of the latter Is taken in turn at if* 
pivotal rank 2 nd compared with all subsequent R 2 ranks of dial column, hath subsequent R 2 rank is counted 
as 1, 0.5 or 0. according as it is higher than, equal to (tied with) or lower than the pivotal rank under 
consideration. The count of all the subsequent ranks for each pivotal R 7 is entered as the C \ of die latter in 
the table. Thus, for the first pivotal rank 7.5, the subsequent lower ranks 2.5, 2.5, 1, 5, 5 and 5 are counted 
as 0 cadi, the subsequent tied rank 15 is counted as 0.5, and each of die subsequent higher ranks, % 12, II 
and 10. is counted as 1 ; so. the count of ranks ( C ( ) totals 4.5 for the first pivotal rank 7.5. This is repeated 
for *arh Rj rank in turn, comparing it with the R^ ranks following it, tot not with those preceding it in that 
rrAnmn The C values of all the R 2 ranks are then totalled to give the ZC\ of 51,5, 


Table 8.11. Assigning ranks to scores of pulse and respiratory rates. 


Individuals 

Respiratory rate 

Pulse rate 


Score (X) 

Rank (/?,) 

Score (Y) 

Rank (R 7 ) 

1 

15 

5 

72 

5 


2 

16 

6 

72 

5 


3 

12 

2 

70 

2.5 


4 

21 

11 

82 

11 


5 

17 

7 

75 

7,5 


6 

13 

3 

70 

2.5 


7 

18 

8 

72 

5 


8 

11 

1 

75 

7.5 


9 

14 

4 

68 

1 


10 

20 

10 

84 

12 


11 

19 

9 

79 

9 


12 

22 

12 

80 

10 






Scanned by CamScanner 












correlation and regression 



8 


10 

11 


]2 

7.5 

2.5 

2.5 
1 

5 

5 

7.5 
5 

9 
12 
11 

10 


1 + 1 + 1 + 1 + 1 +1+1 

1 + 0.5 + l + i + j + j 
0+ l+l+l+l 
1 + 1 + 1 + 1 
1 + 1 + 1 
0 + 0 
0 
0 


4.5 

8.5 
8 

8 

6 

5.5 
4 
4 

3 

0 

0 

0 



x - 


4XC,- „(„-|) _ 4X5L5-I2I12-D 

,/["(» I) S r i]["(n-l)-lr 2 ] J[l2(12-1)-0][|2(12-1)-|o] 


= + 0.58. 


2. Alternative method : 

(fl)-(c) Same as in the previous method, using Tables 8.11 and 8.13. 

subseqi^n^rariks 6 of that ^column ^re S cou nted° as^ oT ST" “ T “ ^ PiV ° tal ^ and the 
£ ,he p,vo,al - —-—- 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 
11 
12 

Total 


_2_ 

7.5 

2.5 

2.5 
1 

5 

5 

7.5 
5 

9 
12 
11 

10 


Table 8.13. Count of ranks for tau (seco nd method). 

Count of subsequent ranks ( C) 

1 1 1 — 1 — 1+0 — 1 + 1 + 1 + 1+-1 
0 - 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1+1 
-l+l+l+l+l+l+l+l+l 
l+l+l+l+l+l+l+l 
0 + 1 + 0 + 1 + 1 + 1+1 
1+0 + 1 + 1 + 1 + 1 
- 1 + 1 + 1 + 1 + 1 
1 + 1 + 1 + 1 
1 + 1 + 1 
- 1 - 1 
- 1 
0 


=- 2 
= 7 
= 7 
= 8 
= 5 
= 5 
= 3 
= 4 
= 3 
=- 2 
=- 1 
= 0 


37 (IC) 


Scanned by CamScanner 



















\7b 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


a 0 ; ITj * I [r 2 (r 2 - l)] = 2(2 - l) + 2 (2 - 1) + 3(3 - 1) = 10. 


r * 2£C _ 2x37 

yjlmn -n-i r,][ N (n -1) -1 r 2 ] ^[12(12-o-o][i2(i2-n-io] 


= + 0.58 


latrrptrtati,m » 

An Ok sample sire (n = 12) exceeds 10, a Mo-rail t test is performed using the computed r, to test * 
H > of no significant correlation. st 'he 


, 8 f&h + S) _ | 2(2x12 + 5) 
r V 9 m(?i- 1 ) \9xl2(12-l) ~ °‘“ 2 ’ 


r 0.58 

/= r = 022T 5sl624 > 4T--t 


•Ok computed , is compared with the two-uil entieal r scores «jr- ~> from Table B of 
r 02 (-i s 2.326 ; = 1576 ; r W(a) = 3.291. 

The computed r being higher than the critical the probability P of getting the computed r hv h. 
is too low and amounts to < 0.01. So the H is rcimrvt „.,u„ ‘ , , . mputtd r chance 

ha« a significant correlt. J. ' ' ^ “ P “ l “ “ c " d ra " «« considered 


Example 8.6.2. 

Find whether there is a significant correlation between the irsi ter**-, • , .... 

aptitude, using the following data for computing tau. mcchamcal ability and technical 

1 2 3 4 5 6 7 8 9 10 11 12 

19 39 22 27 31 28 26 20 21 36 15 9 

6 24 10 13 20 >3 10 13 18 14 12 U 


Student 
Mechanical ability 
Technical aptitude 


Solution : 

L First method : 

(a) Ranks are assigned in ascendino ftB i a „ . 

average rank to each score of a tied set (Table ^^^ epanile ^ 10 l ^ e scores of the two variables, giving an 

(b) Because mechanical ability scores show no n 

^ranged in an ascending order in Table 8 15 nairino and consequently no average rank, their ranks (R.) are 
the same individual. ^ pamng *** *i the rank R 2 of technical aptitude score of 

(c) R, ranks include no average rank ; so IT - n n , «, , 

Vtt ** two ran ^’ 2.5 and three ranks of 7. 1 2 ranks show two tied sets with average ranks, 

lt^. *’* l7 ' 2 = I[f 2(t 2 - 1 )] = 2(2-1) + 3 (3_ n = 8 

(d) Starting from the top of the column nf 

SSJ “ —p— with an “ ch of ,he u,tcr “ “to" i" « as * 

P««d.ng a Each subsequent R ^ js £ J f s Mhwmg at in that column, but nol with thou 

with) or lower than the pivotal rank under consider^™ t? “, accordm 8 as it is higher than, equal to <«J 
^ >°“1 ««m C for the lane, FoT ex lT 7* algebraic sum »f “ J com* for each pivot* 

ranks of 10,11, 9 and 12 are each counted as +1 each'suh ' 4 " h P ' V ° tal R - of 7 - lhe subsequent highrt 

as +1, each subsequent tied rank „f 7 is counted as 0. and d* 


Scanned by CamScanner 
















177 


CORRELATION AND REGRESSION 

, r.f 1 for that pivotal 

f 2 5 and 2.5 are each counted as -1 ; this gives thetota 
ranks of 2.5 ^ adde(J to gj Ve the IC of -4. 


IS. "r allle * 2 ranks are added to give the IC of 34. 


2> lc 


’ . i4 Ranking of mecha nical abiUfy and lechnical apriw descoms. 

Table *—-- , —-Technical aptitude 

Mechanical ability- -—- (R 



TableU5 . Tnhlc for computing » been n~han,cal abiUfy and fechnica. apumde .cores 


1 

2 

3 

4 

5 

6 
7 


10 

2.5 

2.5 


Count of sub sequent ranks (O 

l-l+l+l-l-l+l+l+l+l+l 
- 1 + 1 + 1 - 1 - 1 + 1 + 1 + 1 + 1+ 1 
l+l+l+l+l+l+l+l+l 
l_|-l+0+0+l+l+l 
_l_l-l-l+l-l+l 
0 + 1 + 1 + 1 + 1 + 1 
1 + 1 + 1 + 1 + 1 
o + 1 + 1 + 1 
1 + 1 + 1 
-1 + 1 
1 


5 
4 
9 
2 
- 3 
5 



r = 


J[n(n-1) -1r,][n(n - D" I }H 12(l2 1} °I 12(l2 7 1) 8 1 


Alternative method : 

(«Hc) Same as in the preceding method, using Tables 8.14 and 8. . 



Scanned by CamScanner 












178 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


(d) Each paired rank (R 2 ) of (he second column (Table 8.16) is taken in turn as the pivotal rank and 
subsequent ranks of that column are counted as 1, 0.5 and 0, according as they are higher than, equal to 
lower than the pivotal rank. These counts are entered in Table 8.16 and totalled for all the pivotal ranks t 0 
give the 1C ,. 

Table 8.16. Count of ranks for tau (second method). 




r 2 

Count of subsequent ranks (C f ). 



1 

4 

1 +0+1+1+0+0+1+1+1+1+ 

l = 8 


2 

5 

0 +1+1+0+0+1+1+1+1+1 

= 7 


3 

1 

l+l+l+l+l+l+I+l+l 

= 9 


4 

7 

1 + 0 + 0 + 0.5 + 0.5 + 1 + 1 + 1 

= 5 


5 

10 

0 +0+0+0+1+0+1 

= 2 


6 

2.5 

0.5 + 1 ♦ 1 + 1 + 1 + 1 

= 5.5 


7 

2.5 

1 4* 1 + 1 + 1 + 1 

= 5 


8 

7 

0.5 + 1 + 1 + 1 

= 3.5 


9 

7 

1 + 1 + 1 

a 3 


10 

11 

o+i 1 

■ 1 


11 

9 

1 

» 1 


12 

12 

o 





= 0 


Total 



50 (XC,) ,. 


r = 


= + 0.53. 


Z7 > “ 0 ; I7 t s E l' 2 (' 2 - D1 = 2(2 - 1) + 3(3 - 1) = 8. 

~r. 4 ^ C '~ n(n ~ l) = 4x50-1 2(12- h 

VH'* -1) -1 T x ][n(n -1) -1 r 2 ] 7lI2<12-l)-0][l2(l2-l)-8] 

Interpretation : 

correlation^ " > '°' e '' her : ' mJy be Com P u,ed &°m r to test the H a proposing no signift, 


«. ~ + [2(2x12 + 5) 

V 9n(n-l) - V 9x12(12-1) = 0.221 ; 


_ _ T_ 

~ s r ~ 


0.53 

0.221 ~ 2 ' 40 ' 


The fractional area of the «„•# * r °' 221 

<*Appmda). So , the probab„ ity ^ ~ * 0.49,8 (Tab 

P = 2 [ 0.5000 _ / lu De very low : 

So. the two variables have i si Rn ifi ^ ^ = 2 { °' 5000 ~ °-49l8) = 0.016. 

[Instead of t may alsQ b CurreIatlon with each other (P < 0 .02) 

level of significance (see EtanpuZ^ *** * ^ C ° mpared critical r scores «,/= „) t0 find 


Scanned by CamScanner 


9 Z 





















correlation and regress, on 


POINT B'SERUL r Com putation 

'■ is a specialized fonn of the product- 

nl r, used as a measure of H near . “ * “<> 1 are the ptopottions of cases 

"""lation between a continuous measurement a lv,duals >» the two classes of the 
Iwe and a dichotomous nominal variable. I, ' “able. X and X are the mean 

Applicable only when the dichotomous continuous"variable’for individuals 

v /di- ; c opnuinelv dichntnm^: _ ur cases -- 

van 


3JJU ** -- --- y ur lUDle It 

"ffplicablc only when the dichotomous Tl* continuous'variable’for indiv.duak 
Liable >a «enuine/y dichotomous with an ,, “ belon S in S t0 the respective classes of 
intervening ga P be,ween its two classes; e.g., e /Echotomous variable. X is the grand mean 
S c*. living-dead, success-failure, right-wrong, “j. s * 15 SD of all scores of die continuous 

Rh” Rh - etoTtJ !the sal ,? d " " " Umber o f individuals in 


or. 


I*. — 


X n -X 

5 X 

X-X 


^"pTsitive-HIV negative, pregnant- , K 

nonpreg nant< ^ » etc - For example, point e sam ^ e ’ 

biserial r ( r pU ) may be computed for 
correlating either sex or pregnant-nonpregnant 
states with the serum cholesterol values. 

In giving the final form to a psychological s x 

test, r pbi may be computed to correlate the The 'o 
dichotomously scored right-wrong answers to a avoided h% pu . ta,l0n of P <?. may be 

P™ m . .. 11 "'inuoua series of .umbers L '?. n j which « “>« 

M .J scores of the entire test ; uluals the 

correlation is often undertaken for the selection * 

of test items. 



Assumptions 

For applying point biserial r, it should be 
justifiable to assume that : 





W oo. of tho variables it . :inuoia 
measurement variable while tl u - 


+ , 00 C lT IUeS 0f r ' u rangC from 00 >0 

s * -*=■— zzizj, 

distrihm y,C j * conllnu °us or normal L'" ‘" h ^ lomy of ^ nominal variable. So 
mbuuon even on further exploration ; * grea “ r lhc difference between p and q in 

m .. . Ihe “"Ho. *e smaUer is the value of r 

involved Z'T US mea, “ rement variabl e Significance 

t^° n ’ he P°P ula,l °". whhou" much corTebtio ^ h W ° pr0p0s<:s ,hat 'here is no 
^“ Ss : correlation between the variables in the 

(c) i„ . v population, a two-ual , U done. Where s 

variable K b class of the dichotomous ls tbe SE of , r pbi 

occurs at^'r SC ° re 0p continuous variable _ 

scores • U ° fn Independent of all other _ /1—rL- 

5f pbi ~ \ n-2 * ~ n - 2 ; 


W> there is •» /• 

tables ‘‘near relation between the 





Scanned by CamScan 





180 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Alternatively , for a large sample, 

V = 77 ; ' = r Fb ,J~n ; df = ~. 

The computed t is compared with the two- 


tail critical t scores. The computed r 
considered significant at or below that l e ^j 
significance whose critical t score is e j,. ' 
equal to or low'er than the computed / (P ^ J r 


Example 8.7.1. 

The mean vital capacity was found to be 5.32 litres in 77 nonpregnant women and 4.26 lines m ^ 
pregnant women. The SD of vital capacity for the whole group of 144 women was found to be 1.452 lit fc 
Work out an appropriate correlation coefficient between vital capacity and pregnant-nonpregnant states, ami 
interpret the result. 

Solution : 

Because vital capacity is a continuous measuremert variable and pcegnaaMioqxegittM states coastf** 
a genuine dichotomous variable, r )/ ( should be computed. 


= 77 ; 


", = 67 ; 


77 


n 


144 


0.535 : 


* * 77 ♦ 67 • 144 . 

i 67 

■ tjt ■ 0.465. 


*,,=5 321 \- 4 . 26 L; - 1.452 L 


r - —L 

r pbi ~ 


X..-X, 


Sx = ■ - ? | 2 ^ :> Vo.535xO«I> = + 0.36. 


>°-^= £ #^= 5M - 4 - 26 


nss 


^77x67 = + 0.36. J 


I 44 XL 452 

= lM\ tatl 1 ^ 1§ d ° ne f ° r findln " ^ lgruficance of computed r^. Because the sample is large 


(n = 144), 


s r„hi ~ ~r ~ ~l - = 0.083. 

pbl v» V144 


t=&L = 


0.36 

0.083 ~ 4 337 ; d f - 


Two-tail critical t 


■ 001 (~) 


amounts 


_ .. . , . t0 J . 2 ? 1 1 . (Table B of A PPendix). Because the computed t of 4.337 is 

higher than even the critical r the probability P of correctness of H v proposing no correlation, amounts 
to less than 0.001 and is considered too low. So. there is a significant correlation between the variables 

(/^ ^ 0.001). 


Example 8.7.2. 

Find if there is a significant linear correlation between sex and pulmonary minute ventilation (L nun' 1 )- 
using the following data of the minute ventilation in 12 men and 8 women. 

Men : 6.55, 7.50, 7.16, 9.00, 8.50, 6.24, 7.30, 8.20, 7.45, 8.22, 7.28, 8.00 
Women : 6.42, 7.35, 7.05, 6.76, 7.82, 6.60, 7.92, 7.88. 

Scanned by CamScanner 
















I 


CORRELATION AND KI'OREMION 


181 


SolutiM ; 

is a genuine dichotomous vuriablc while pulmonary ventilation >* a continuous measurement vnrinblc. 

(a) Proportions of cases (p and q) in the two sexes arc worked out 

n r m 12 ; n q m $ ; n • n fi n q • 12 + 8 ■ 20. 


I> 


"JL.* 

„ 2D 


« ; 

• 0 . 6 ; 


n. 

n U 20 


0.4. 


(I,) Means and over-all 5/) of the ventilation scorci arc computed (Table 8 17). 

Table 8.J7, Table for computing mc.ms and SD of pulmonary ventilation scores. 


Pulmonary ventilation 

x„-x 


X -X 

(X-X) 2 

Men (X p ) 

Women (Xj 

e 

a 

6.55 

6.42 

0 91 

0.8281 

- 1.04 

1.0816 

7.50 

7.35 

+ 0.04 

0.0016 

-0.11 

0.0121 

7.16 

7.05 

-0.30 

0.0900 

- 0.41 

0.1681 

9.00 

6.76 

+ 1.54 

2.3716 

-0.70 

0.4900 

8.50 

7.82 

+ 1.04 

1.0816 

+ 0.36 

0.1296 

6.24 

6.60 

- 1.22 

1 4884 

-0.86 

0.7396 

7.30 

7.92 

-0.16 

0.0256 

+ 0.46 

0.2116 

8.20 

7.88 

+ 0.74 

0.5476 

+ 0.42 

0.1764 

7.45 


-0.01 

00001 



8.22 


+ 0.76 

0.5776 



7.28 


-0.18 

0.0324 



8.00 


+ 0.54 

0.2916 



I 91.40 

57.80 


7.3362 


3.0090 



£X, , _ 91.40 
“ 12 


7.62 L ; 


y - tX' + H 


91.40 + 57.80 
20 


7.46 L. 


\ 


I(X f ,-X) 2 > I(X (/ -X) 2 _ Jr 
n -1 » 


3362 + 3.0090 


20-1 


0.738 L. 


X„-X _ 7.62-7.46 |06 
*'* r r bi ~ *x M 0.738 V 0.4 


+ 0.266. 


x,-x k 

l (ir < r pbl ~ S X \ n i 


t = 


7.62-7.46 12 
0.738 V 8 = + a2661 




A two tail / test U undertaken to test the 7/ 0 of no correlation between the variables. Because the sample 

in small (n = 20), 

. JUJU . - 0.2272 ; WSL. lm . 

f 7M V rr-2 V 20-2 J 0.2272 1 17 * 


Scanned by CamScanner 























182 STATISTICS IN BIOLOGY AND PSYCHOLOGY 

df=n- 2 = 20 - 2 = 18. 

Two-tail critical t scores {df =18) are quoted from Table B of Appendix. 

*. 05 ( 18 ) = » * 10 ( 18 ) = 

Because the computed t is lower than even the critical t 10 , the probability P ot the H 0 being correct ij 
too high — there is no significant correlation between the variables (P > 0.10). 

Example 8.7.3. 

Using the data presented in the first four columns of Table 8.18, find if there is a significant correlation 
(cr = 0.05) between the total test scores of 200 students and their yes/no answers in a dichotomously scored 
test item in a psychological test. 

Solution : 

The total test scores constitute a continuous measurement variable while the yes/no answers to the lest 
item constitute a genuine dichotomous variable. So. is computed for the item-total correlation. 



Table 8.18. 

Table for computing the 

means of psychological test scores. 


Total 

Midpoint 

Number of students 

//c 

/A 

test scores 

(X c ) 

‘yes' to test item (f p ) 

‘no* to test item (fj 

91 - 105 

98 

5 

0 

490 

0 

76 - 90 

83 

25 

8 

2075 

664 

61 - 75 

68 

45 

20 

3060 

1360 

46 - 60 

53 

28 

25 

1484 

1325 

31 - 45 

38 

15 

10 

570 

380 

16 - 30 

23 

6 

7 

138 

161 

1 - 15 

8 

2 

4 

16 

32 

Total 


126 (n p ) 

74 (n ) 

7833 

3922 

(a) Proportions (p and q) of students answering respectively ‘yes’ and ‘no’ 
worked out. 

to the given test item ar 


p = -He- 

= 126 = 063- 

q _ n q 74 

= 0.37. 



n r +n , 

126 + 74 ’ 

n P + n q 126+74 


(b) Mean test scores (X p 

and X,) of students answering respectively -yes’ and ‘no 1 to the test item i 

Iso the grand mean X of the total test scores are computed (Table 8.18). 




Y - lf P Xc 7833 _ 

x p = „ - - 62.2 ; 

n p 126 

y. _ 3922 

9 \ 74 

= 53.0 ; 




Y . I/A+I/A 

7833 + 3922 = 58.8. 





H P +n q 

126 + 74 










Scanned by CamScanner 

























CORRELATION AND REGRESSION 


183 



W T* 0Ver - a11 SD 0f ,0tal scores » computed using x (Tabk 8.19). 

s x * ^ + (4MI5.64 +30839.36 

» "n + ",-l V H6 L74-i - “ l9 m 


(d) r pbi is computed as follows. 


Alternatively, 


X n - X. 


r r bi ~ R T X * ~ ^i9.70 3 0 > ^ 63x0 - 37 * ♦ 0.23 ; 

[or. r . = fZ - 62.2-58.8 [063 

* j x \ q 19.70 V 037 = + 0,231 


X,,-X _ 62.2-58.8 fl26 

/*• \ .. iTT™—= + 




19.70 \ 74 


0.23. 


p ™o contends that ' i * rt is no significant item-total correlation. A two-tail t test is done to find the 
r ot this Hq being correct. Because the sample size is large (n = 200), 

t = = 0.23V200 = 3.253 ; df= ~ 

a = 0.05 ; critical r 05(<i>) = 1.960. 

As the computed t exceeds the chosen critical t K , there is a significant item-total correlation (P < 0.05). 


8 -8 BISERIAL r 

This is a specialized form of the product- 
jft°nient r, used for simple linear correlation 
VVeen (i) a continuous measurement variable 
and (ii) either an apparently dichotomous 
seen as consisting of two distinct 
^^s in the data, but expected reasonably to 


yield continuous series of metric data on more 
intensive exploration, or an artificially 
dichotomized variable formed by bisecting the 
experimentally obtained data of a continuous 
variable at a point near the median of the latter. 

Apparently dichotomous variables include 
variables such as pass-fail, homeotherm- 


Scanned by CamScanner 















184 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


poikilotherm, neurotic-non-neurotic, trained- 
untrained, practised-unpractised and athlete- 
nonathlete. 


On the contrary, the metric scores of a 
continuous variable may sometimes have either 
a truncated distribution in the sample with no 
score beyond certain scale values inspite of 
scores known to occur in the population even 
beyond such limits, or a skewed distribution in 
the sample inspite of the latter having been 
drawn from a population having a normal 
distribution of that variable. The observed 
distribution of scores of such a variable may be 
bisected into two classes at a point close to the 
median of those scores, so as to form an 
artificially dichotomized variable such as 
diabetic-nondiabetic, normal-hypcrcholcs- 
terolemic and nonhypertensive-hypcrtcnsivc. 

Biserial r may be computed for correlating 
any of the above two types of dichotomous 
variables with a continuous measurement 
variable, e.g., between diabetic-nondiabelic 
states and serum cholesterol values, 
hypertension-nonhypertension and serum 
cholesterol concentrations, diabetes-nondiabetes 
and systolic blood pressure scores, and trained- 
untrained states of athletes and their vital 
capacity values. 


In psychology, biserial r is frequently us 
instead of r .. for (i) an item-total correlati< 
between a dichotomized test item and the toi 
scores of a psychological test, on tl 
assumption that the dichotomy of the test ite 
represents an apparent dichotomy of a nea 
normally distributed variable. Biserial r m; 

“i e „ WOr h ked ° Ut f ° r 07) a " i'^-crueru 
Ween a dichotomized test i le 

criterion f 0 ° n “ nU0US scores °f an extern 

capacitv of ,T S ° aS ,0 assa » 

attribute ‘ eS ‘ “ em in measur *ng th 


limits of +1.00 and -1.00 if continuous scores 
of the dichotomized variable are skewed in ib c , 
population. Moreover, the further away is tb e 
point of dichotomy of the dichotomized 
variable from the median of the continuous 
scores underlying it, or the smaller the sample 
size (n), the higher is the SE of r h and 
consequently, the lower the dependability of the i 
latter. 


Assumptions 

For applying biserial r, it should be 
justifiable to assume that : 

(a) one of the variables is a continuous 
measurement variable yielding n continuous 
metric series of scores while the other is an 
apparently or artificially dichotomous variable 
which cither would yield continuous metric 
data on further exploration or has been 
dichotomized from such continuous scores ; 

(*) the continuous measurement variable 
involved baa a normal or near-normal 
distribution in the population without much 
skewness ; 


(c) the continuous metric data, underlying 
the dichotomous variable, have a unimodal and 
normal or near-normal distribution in the 
population — in case of doubts about this 
assumption, should be preferred to r h ; 

(</) the dichotomous variable has been 
dichotomized at a point not far from the 
median of the continuous distribution of its 
scores , in other words, the proportion of cases 
in each class of the dichotomized variable 
s ould not be far different from 0.50 ; 

v . (e) ,n each class of the dichotomized 
e, every score of the continuous variable 

scores 5 31 random and ^dependent of other 


The computed values of 


r b ma y cross the 


variables^ * 3 relationshi P between 


UJC 


Scanned by CamScanner 










CORRELATION AND REGRESSION 


185 


Computation 

Where p and q are the proportions of cases 
of the sample in two classes of the 
dichotomized variable, y is the height of the 
ordinate of the unit normal curve at the point 
of division of the normal curve area into p and 
q, X r and X tj are the means of the continuous 
measurement variable for individuals belonging 
respectively to the classes p and q of the 
dichotomized variable, X is the grand mean 
and s x is the overall SD of all the continuous 
variable scores in the sample, 


r s . 

b h y 


or, r b 


*x y 


The ordinate y is found out from the normal 
curve table (Table A of Appendix). It is the 
ordinate of the unit normal curve corresponding 
to that fractional area from its mean [p = 0) 
which equals the absolute difference (neglecting 


the algebraic sign) between p and half the area 
(viz., 0.5000) of the unit normal curve. 

If needed, the computed r b or r pbi be 
converted to each other by the following for¬ 
mulae. 



Significance 

Where n exceeds 30, neither p nor q is less 
than 0.10, and the H 0 proposes that there is no 
correlation between the variables, 


I fw . 
yV n ’ 


z 


= A. 

ft 


The computed z is then referred to the table 
of normal curve areas (Table A) for 
determining the probability P of the H () being 
correct. 


Example 8.8.1. 


Using the data of Example 8.7.3, compute biscrial r for item-total correlation of the relevant 
psychological test. 




Solution : 

(a) The following statistics, to be used here, have already been computed in Example 8.7.3. 

p = 0.63 q = 0.37 ; _ n p = 126 ; _n q = 74 ; n = 126 + 74 = 200. 
X p = 62.2 ; X q = 53.0 ; X = 58.8 ; s x = 19.70. 


(b) Since p = 0.63, the absolute difference between p and 0.5000 amounts to | (0.63 - 0.50) | = 0 1300 
Thus, y is the ordinate at the point limiting the area (p - 0.50), or 0.1300 from p in the unit normal curve 
By extrapolation from Table A of Appendix, y amounts to 0.3776 for this area of 0.1300 


(c) r b may be computed by either of the following formulae. 


Either, 


„ _ X p~ X <? - M _ 62.2-53.0 . 0.63x0.37 
b s x y 19.70 X 0.3776 “ + °' 29 ’ 


or. 


_ Xp X p _ 62.2-58.8 0.63 

b s x y 19.70 X 03776 = + 0 - 29 - 


(d) To test the H 0 that there is no item-total correlation, the computed r, is converted to ; score 



Scanned by CamScanner 




















186 


STATISTICS IN BIOLOGY AND PSYCHOI,(HI\ 


- -iS, 1 f 0.63x0,37 _ . Jfc. 029 

\ yV n ' 03776 V 200 °° % ‘ * r f<i 0.00 **" 

Two-tail probability P of the H 0 being correct, is given by : 

P = 2 [0.5000 - (fractional area of unit normal curve front its ft to the „■ score ol \ 22)| 
= 2 (0.5000 - 0.4994) = 0.0012. 

As P is too low, there is a significant item-total correlation (P < 0 002). 


Example 8.8.2. 

The mean vita) capacity amounted to 5.4 hires for 66 athletes and 4.6 litres for 114 nonnthlclc* n„ , 
of vital capacity was 1.86 litres for the entire sample of 200 cases. Is there any algtu Ileum corral., " 
tween vital capacity and athletic status 7 u ,0|> 


wti 5 i.ou mres ior me ei 
between vital capacity and athletic status ? 

Solution : 

Vital capacity is a continuous measurement variable while athlete nonnthletr «».>„ 
Jchotomiaed variable, is reasonably ... be a comm,.. , , 


dichotomized 
this case. 

( 0 ) Proportions of cases <p ..,.. cl»« of th. 111 —hit,, s 

~ 66 ; n - * 134 : n * \ ■ 66 ♦ 134 « 200. 


p = —£■ = . a 0 3V n % 134 

n 200 0 33 • 9 n m 200 m 0 67 ’ 

(b) Since p = 0.33, the absolute difference between /. and n smn 

so. y .S the ordinate a, the point burn,,>*7^.^? rroTr,.L <M °- M»l- 0.1700; 

From Table A, y amounts to 0.3621 for this area of 0 1 700. °° * " " lhc Ulli, " orm “ l surve. 


(c) r b may be computed using either .Y ; or X 


p ^ \ - 4.6 L ; s x = 1.86 l ; X = 'VV * 'VVi _ 66 xS.4 + 134 x 4 6 

n 200-~ * 4.86 L. 


r b = i_i x P9 _ 1-4-4 6 0.33x0.67 

** y I 86 X ~ 362l ~ s + °-26, 

x £ - l 4 -4.86 0.33 

y 1.86 X o362f = + °- 26 - 


or, r b = 


X D -X 


score^ T ° teM thC *° ,h3t there is no correlaUon between the variables the 

awes, the computed r h is converted 



% = 7 

> v « 0.3621 V 200 = 0 09 2 ; z _ Jk_ 0.26 _ 

P = 2 [0 - 5000 ~ (fractional area of uni, " 0092 " 2> 

= 2 (0.5000 - 0.4977) = aoo*. n0rmal ^ « score of 2.83)) 


83. 


n _ —vv-TV, ’ V wi at.G 

As the P is too low • 

significant correlation between the variables (P < O.OOS). 


Scanned by CamScanner 




















CORRELATION AND REGRESSION 


187 


E , S PHI COEFFICIENT 
0 , jj| coefficient ( <p ) is a nonparametric 

^ of correlation between two genuinely 
st ati stlC nom inal variables, each having 
with a genuine intervening gap ; 
t*° c 3 jjving-dead, success-failure, right- 
e-i-’ ^^wers, HIV positive-negative, Rh + - 

' v ' ron p regn ant-nonpregnant, etc. 

^ln psychology , <P is used for (0 item-to-item 
Iation between two dichotomously scored 
corr< jt einS (e.g.. scored as yes/no, 1/0 or right/ 
teSt ' ) (//) item-criterion correlation between 
^dichotomously scored test item and a 
3 uinely dichotomous external criterion 
^"resenting the attribute under investigation, 
2 (HD assessment of the power of a test item 
to discriminate between the individuals falling 
in two classes of the criterion. 


Assumptions 

For applying 0, it should be justifiable to 
assume that : 

(a) both the variables are genuinely dichoto¬ 
mous with no reasonable expectation to yield 
any continuous series of data on more exten¬ 
sive exploration ; 

(b) each of the variables has a bimodal 
distribution in the population with a genuine 
gap between its two classes. 


cases, i.e., the frequencies B and C of 
individuals or cases with positive or high 
values of one variable and negative or low 
value of the other. The cells A and D contain 
frequencies of concordant cases, i.e., the 
frequencies A and D of cases with respectively 
positive (or high) and negative (or low) values 
of both variables. A positive correlation is 
indicated where AD > BC, a negative 
correlation by BC > AD, and no correlation if 
AD = BC. 

(b) Frequencies of individuals are entered in 
the cells depending on the high and low value 
status of the cases with respect to the variables. 
For example, the frequency of cases, belonging 
to the high-value class of the column-variable 
and the low-value class of the row-variable, is 
entered in the cell C. The total of cell 
frequencies of each row is entered as the 
marginal total (/p of that row while the total ol 
cell frequencies of each column is entered as 
the marginal total (f c ) of that column (Table 
8.20). Thus, (A + B) and (C + D) give the f r 
values while (A + C) and (B + D) give the f c 
values. The sample size n equals I f r as well as 

V, 

(c) <p is computed as follows. 

A D-BC _ 

V(A + B)(A + CJ(B + D)(C + D) ' 


Computation 

(a) A fourfold (2 x 2-fold) contingency 
table is drawn. The top and bottom cells of the 
right column of the table are named A and C, 
respectively, while those of the left column are 
named B and D, respectively (Table 8.20). 
Classes of one variable are represented along 
the columns of the table, the right column for 
toe high-value or positive class and the left 
column for the low-value or negative class. 
Classes of the other variable are represented 
^ung the rows of the table, the top row for the 
high-value or positive class and the bottom row 
for the low-value or negative class. Thus, cells 
B and C contain frequencies of discordant 


The computed <p ranges from -1.00 to 
+1.00, but seldom reaches the extremes. 

Significance 

To test the H 0 proposing no correlation 
between the variables, the computed <p is 
converted to X 2 ( chi square). 

X 2 = n<p 2 ; df= 1. 

The computed X 2 * s compared with the 
critical X 2 va l ues (df = 1) fr° m Table C of 
Appendix. The computed <p * s significant at or 
below that level of significance which has its 
critical X 2 respectively equal to or lower than 
the computed f (P ^ 


Scanned by CamScanner 






188 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Example 8.9.1. 

Out ol 40 pregnant and 60 nonpregnant women, respectively 12 and 24 were found HIV-positive. Is 
a significant correlation between pregnancy and HIV-positive test ? 

Solution : 

(n) The HIV negative-positive variable and the pregnant-nonpregnant variable are represented respectively 
along columns and rows of a 2 x 2-fold contingency table (Table 8.20). The HIV-positive class and n, e 
pregnant class are considered the positive classes of the respective variables and represented along th e 
column and the top row respectively. The frequencies of cases are then entered in the cells according to th e 
classes of the two variables to which each case belongs. The marginal totals. f r and are computed for th e 
rows and columns respectively. 


Table 8.20. Fourfold contingency table for correlating HIV test results with pregnancy. 


HIV-negative 

HIV-positive 

Total (£) 

Pregnant 

28 (/?) 

12(A) 

40 (A + B) 

Nonpregnant 

36 (D) 

24 (O 

60 (C + D) 

Total (f t ) 

64 (B + D) 

t 

36 (A + C) 

100 (/r) 


(h) 0 is computed using the cell frequencies and marginal totals 


0 = 


AD-BC 


12x36-28x24 


- 0 . 10 . 


>/M + fl)(A + C)(B + DXC+D) y40 x 36 x 64 x 60 

(c) The computed <p is converted to for interpretation : 

= 100 x (-0.10) 2 = 1.00: d/ml. 

™°r ,o 107 ,Tab " c - 

and HIV-test result (P > 0.30). " significant correlation between pregnancy 


Example 8.9.2. 

both of Lm h wXgly .^0 pers^TgLc 3 r'ghfanswlrwTtt^ l* 0 KS ' iKmS - 25 (> ' ,SO " S answc, " i 
persons answered item 2 wrongly but item 1 correctlv k ih' ^ Wr ° ng answers to item 1, and 10 

tents 7 em k 'tete on tlem-to-item correlation between the two 

Solution : 

(^) The data are arranged in a ^ x ^ fnlH 

one item along respectively the top and tettom3T g rd y t hn ble, f re K COrdin8 ?* right and w ™g answers of 
e t co umns (Table 8.21). The frequency of persons p - ^ ° 1 e olber ' tem a * on g respectively the right 
nswers to the two test items, is entered in the specifi'^liV* 111 * 011 ^ combination of right and wrong 

tbtai« C / ,Cy h j? erSOns * answer *ng item I wrongly but item 2r ° F | SUCl1 combination - For example, the 
• f r a,ld f e . are worked out for the rows and columns respSely ** Cntered cdl C * The margin3 ‘ 


■ 

Scanned by CamScanner 






















CORRELATION AND REGRESSION 


189 


Table 8.21. Fourfold contingency table for correlating test items. 



Item 2 


Total 

Item 1 

wrong 

right 


right 

wrong 

10 (B) 

25(D) 

35(A) 

20(0 

45 (A + B) 

45 (C + D) 

Total (f e ) 

35 (B + D) 

55 (A + Q 

90 (n) 


(b) <p is computed, using the cell frequencies and the marginal totals. 


AD-BC 35 x 25-10 x 20 

<p = -J= = j" " = + 0.34. 

VM + BXA + CXB + DXC+D) V45x55x35x45 

(c) The computed <p is converted to ^ for interpretation : 

f = nQ 2 = 90 x (0.34y 2 = 10.40 ; d/= 1. 

Critical x 2 va ' ues • Z 2 oouti = *9.83. and Z 2 qu\ . = 0.64 (vide Tabic C of Appendix). 

The computed x 2 * s higher than the critical x\ • so. the computed 0 is significant below the 0.01 
level. Thus, there is a significant correlation between the test ttems (P < 0 01). 


8.10 TETRACHORIC r 

Tetrachoric r (r f ) is a measure of correlation 
between two dichotomous variables either (i) 
when they are only apparently dichotomous in 
the data and may yield normally distributed 
continuous series of scores on more extensive 
exploration, or (li) when their continuous scores 
in the data have been artificially dichotomized 
into two classes although they have normal 
distributions in the population (pages 177-178). 
Such variables include diabetic-nondiabetic, 
normal-hypertensive, normal-neurotic, 
homeotherm-poikilotherm, trained-untrained, 
athlete-nonathlete, practised-unpractised, pass- 
fail, genetic crossover-noncrosssover, etc. 

In a psychological test , r f is often computed 
during item analysis for an item-criterion 
correlation between a dichotomousl> scored 
test item and an artificially dichotomized 
external criterion representing the attribute 
under study. Thus, r t may assess the power oj 
a test item in exploring the attribute represented 
by the external criterion. 


Assumptions 

For computing tetrachoric r, it should be 
reasonable to assume that : 

(a) the variables either have continuous 
metric data which have been artificially 
dichotomized, or are only apparently 
dichotomous and may yield continuous scores 
on further exploration ; 

( b ) such continuous series of scores of the 
dichotomized variables form unimodal and 
normal distributions in the population ; 

(c) the point of dichotomy is close to the 
median of each variable so that neither of the 
proportions (p and q) of its classes is far from 
0.50; 

(d) there exists a linear relationship 
between the continuous scores of the variables. 

Computation 

After the data have been arranged in a 2 x 
2-fold contingency table in the same way as in 
case of the phi coefficient (§ 8.9), r, may be 


Scanned by CamScanner 















190 


statistics in biology and psi cholog^ 


computed approximately by the following 
cosine-pic formula* tbe original formula being 
far more complicated. Using the cell 
frequencies (A. B. C and D) of tbe contingency 
table. 



The cosine of tbe computed angle is worked 
out using a scientific calculator 

Tbe value of r # ranges from -1.00 to +1.00. 
When AD = BC , the computed angle is 90° 
so that r t = Cos 90° = 0. When BC - 0. 
tbe angle amounts to 0° so that r, = Cos 0° = 
+1.00. When AD — 0, tbe computed angle 
comes to 180 s . making r f = Cos 180° = -1.00. 
U AD > BC, tbe angle is between 0° and 90® 
so that r, lies between 0 and +1.00. If BC > 
AD, the angle is betw een 90® and 180® so that 
r , lies between -1.00 and 0. 

If the point of dichotomy is far from the 


median, one of the cell frequencies is far | 0Wpr 
than the others and the r f is overestimated. s 0 
r. should not be computed in case of extr*^ 
dichotomies. 

The SE of r f is much higher than th al 
of r. So. for similar stability, the sample si/ c 
(n) for computing r f should be nearly double 
that for computing r. To test the // () 0 f ,, 0 
correlation. 



Ql . 

% 


where p x and q, arc the proportions of cases j n 
twx> classes of one variable, p , and </, arc the 
proportioiis in the corresponding dosses of the 
«her variable, and v, and v, arc the ordinates 
ol the unit normal curxc at the points 0 | 
dichotomy of the respective variables. For 

fildh « >2 fo® P\ and /> .. see page I7<| 
The computed t is compared with critical i 
scores lor finding the significance. 


Example 8.10.1 

havi " 8 " 0I 

Sobaon : 

t “inrztn -• h — 

«« = ae i* a*™ -T*** ^ l - «££££*■£* representing hyphen 

*^* t «****™*amc* v * atiDmiiK ' 

AD = 49 x 100 = 4900 * fir - ni 

to- . BC = 51 x 20 = 1020 ; 

u in conHXHmg r 

. J 17^0+Jim \~ 

ft competed r k — ' ~ J 

»» W^ectne pnspoctioJ^S^ Stnde a 1 s t for testing H f 

“p* -* ■—■ zzxsr* £- 


r , ~ Cos 


AD > BC. 


Cos 56.39" = + 0.55. 


Scanned by CamScanner 













CORRELATION AND REGRESSION 


191 


„ = 0-455 ; 

Pt ' 220 


9i = 


120 

220 


= 0.545 


69 151 

P * = ^ = 0Jl4; = 220 


= 0.6S6. 


Table 8.22. 

Fourfold contingency table for correlating diabetes and hypertension. 


Nonhypertensive 

Hypertensive 

Total (f r ) 

Diabetic 

51 ( B) 

49(A) 

100 

Noiuliubctic 

100(0) 

20(0 

120 

Total (/",) 

151 

69 

220 («) 


The unit normal curve ordinates, y, and v 3 at the points of dichotomy of the respective variables, are 
found out from the differences between half the unit normal curve area, viz., 0.5000, and p, and 

respectively. 


0.500 - />, = 0.500 - 0.455 = 0.045 : >, « 0.396 (Table A). 

0.500 p 2 s- 0.500 - 0.314 * 0.186 ; y, • OJ55 (Table A). 


1 Ml3l3l 
>’i yi V » 


I /O 455 x 0.314 x 0 545 x 0.686 
0.396x0.355 V 220 


0.55 

0.111 


4.955 ; 




Critical t scores (df = «>) are quoted from Table B 


05<~t 


1.960 ; 




r am«t = 5-291. 


Because the computed t of 4.955 is higher than even there is a significant correlation between 
diabetes and hypertension (P < 0.001). 


8.11 CONTINGENCY COEFFICIENT 

Contingency coefficient (C) is a nonpara- 
rnetric correlation coefficient between two 
variables, either or both of which are divided 
into more than two classes with intervening 

gaps. 


Assumptions 

No assumption is needed for the 
enuineness of the discrete or discontinuous 
“lure of distributions of the variables . it is 
Pplicable both to variables with genuine gaps 
etween their classes, and to those which eitaer 
lay be resolved into continuous data on 
Jrther exploration or have been dichotomize 
n the basin of their continuous li>tn u 


Nor is the assumption needed for normality or 
near-normality of their distributions. It should, 
however, be reasonable to assume that each 
individual or case occurs in the sample at 
random and independent of all others. 

Computation 

Contingency coefficient is computed from 
the observed frequencies (fj of the data and 
the expected frequencies ( f e ) calculated on the 
basis of the H 0 which proposes that there is no 
significant correlation between the variables. 

(a) A contingency table is framed with the 
classes of one variable represented along the 
columns and the classes of the other variable 
along the rows (Table 8.23). 


Scanned by CamScanner 


















192 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


(fe) The frequencies (fj of cases are entered 
in the respective cells of the table according to 
the classes of the two variables to which those 
cases belong. The marginal total of each row 
(f r ) and that of each column (f c ) are also 
entered in the relevant marginal column and 
row, respectively. 

(c) Where r and c are the numbers of 
respectively rows and columns containing cells 
withy^ values, df = (r- l)(c- 1 ). 

C d) Choosing at random as many cells as 
the computed df, the f e corresponding to the f o 
of each such cell is worked out as follows : 



( e ) The sum of quotients (S) 
contingency coefficient (Q are then 
using the f Q and f e of each cell. 



S = 




The value of C may range upto l.oo 
does not reach even 0.90 unless 


bu, 

the 


contingency table is higher than 5 x 5-fold 
rises with the number (fc) of classes of t i 5 
variables, but suffers from errors if k is 1 ^ e 
because some f o values may then fall below 5 

Significance 


The H q proposes that there is no correlation 
between the variables. The computed C i s 
convened to z 2 to test this H 0 . 


where f r and f c are the marginal totals of 
respectively the row and the column of the 
relevant cell, and n is the total frequency of 
cases. 

The f e of each of the other cells is then 
computed by subtracting the sum of f f values, 
already worked out in that row (or column), 
from the relevant marginal total (f r or f c ). 


2 "C 2 Jr , 

z = JTp- ; # = (r- l)(c - 1). 

The computed x 2 »s compared with critical 
X 2 values for different levels of significance. 
The contingency coefficient may be considered 
significant at or below that level whose critical 
X 2 cither equals or falls below the computed 
X 2 (P $ a). 


Example 8.11.1. 

Out of 60 people of Mongolian race, 14, 20, 17 and 9 had blood groups of respectively O, A, B and 
AB. But of 40 persons of Polynesian race, respectively 6, 10, 15 and 9 individuals belonged to those bicod 
groups. Is there a significant correlation between race and blood group ? 

Solution : 


correhtion" C bl °° d gr ° UP *"* ^ 2 4 dasses res Pectively, C is computed for their 

(a) The data are arranged in a 4 x 2-fold contingency table (Table 8.23). Marginal totals (f and /) of the 
rows and columns are computed. \j r j c j 

df= (r - l)(c - 1) = (2 - 1X4 - 1) = 3 ; „ = 60 + 40 = 100. 

(fe) The rf/being 3. the f e corresponding to the/ o of any 3 cells, chosen at random is computed. For 
example, for the cell corresponding to the Mongolian race and the A group. 


f _ Me 60x30 _ 

Je n 100 “ - 3 - 



Scanned by CamScanner 






C'OHMI'.I.ATION ANI> REORRflKION 


193 



O group 

f, i. 

A 

4 

group 

4 

It 

4 

fcnnjp 

4 

Alt 

4 

group 

4 

Total {f r ) 

polyn®***** — 

14 

6 

12 

H 

20 

10 

18 

12 

17 

IS 

19 

13 

9 

9 

u 

7 

60 

40 

Totfl if A _ 

20 

20 

30 

30 

32 

32 

18 

18 

100 (n) 


( ,j f nilflfly» ®C^ corresponding to tlie I'olynctuan race and the blood group O, 

/ « LL « 40x20 . 

J * n 100 8 ’ 

and for t hc 6(5,1 corresponding 10 ,hc Polynesian race and the B group, 

f m LL m 40x32 ii 
} * n ~m~ l3 - 

'flic A of each ol the other cell* f. computed by subtracting the already computed f t values from the 
relevant ) r or J, value. Por example, the f t .(mounting to 18 for the Mongolian vs A group cell is deducted , 
from the J r of 30 for the A group to give the f t amounting to 12 for the Polynesian vs A group cell. 

(c) The statistic* S and C arc then computed, using f* and f t values of each cell. 


0.166. 


(J) C is converted to x 1 for testing its significance. 


y2 . Jl£L . 100(0.l66 _ r _ 2.83 ; d/« (r - lXc - 1) = (2 - 1)(4 - 1) = 3. 

I-C 2 l-(O.I66r 

Prom Table C of Appendix, critical z&o) = 7 82 * *30f3) = 366 ’ 2X1(1 ^ 20 ( 3 ) = 4 64 - Thus - «h® computed 
j 2 of 2.83 is lower than the critical x\<, and thc critical x\o- So * lhe probability P of getting the 
computed x 2 and hence thc computed C by chance, exceeds 0.30. Hence, there is no significant correlation 
between blood group* and race* (P > 0.30). 


8.U<if?GRKSSION 

Regression is prediction statistics. It predicts 
lhe most likely value o f a variable on the basis 
fit the given valucCs) of another or oth^ r 
variable's), Th c variable, _w ho>lfc. Ya l »c s fire 
predicted, i« thp (Ippencl enTyciriable or critt 
ton ; the variable wFurHc valucs^form th e jyts 
°f the prediction ia rnllfd >h< * independent 
vc * rlabl e or ’predictor. Regreoi on catL. e 
out onl. if tin I |■■ u ( l< lit > rl * blc " u . 


the independent variable(s) possess significan t 

correlation with each other. It translates the 
relation between two or more variables into an 
expression showing one of them as a function 
of the other(s). Regression, like correlation, 
holds good only in a particular population to 
which the sample belongs, and only for that 
limited range of scores of the variables from 
which it has been derived ; it cannot be 
extended beyond these limits. 



Scanned by CamScanner 









































I‘M 


.(K)Y AND PSYCHOLOGY 


STATISTICS IN BIOI 


I>I»cn of regrawlon 

Regression may be either simple or multiple 
according to the number of predictors involved. 
(a) In simple repression, the criterion or 
dependent variable is a function of n single 
independent variable or predictor — the scores 
of the former arc predicted from the given 
scores of the single predictor ; c.g., the 
regression of height of o person on his/her 
weight ; the regression of examination marks of 
a candidate in Mathematics on his/her 
numerical aptitude test score ; die regression of 
oxygen consumption on the tracheal ventilation 
of an insect. (/>) In multiple regression, the 
criterion is a function of two or more 
predictors ; thus, its scores arc predicted from 
the scores of more than one predictor ; c.g., die 
regression of surface area of a person on his/ 
her height and weight ; the regression of Maths 
marks of an examinee on his/her numerical 
aptitude and abstract reasoning test scores. 

Regression may again be either linear or 
nonlinear according as the relation between the 
criterion and the predictor can be described in 
terms ol a straight line or a curved line, (a) 
Linear regression expresses the dependent 
variable Y as the linear function of the 
independent variable X. In other words, the 
scattergram of the scores of criterion, plotted 
against those of predictor, should show a linear 
distribution of its plotted points. ( b ) For a 
nonlinear regression , the scattergram should 
show a curvilinear distribution like elliptical 
and hyperbolic ones. 

Models of regression 

There are three models of regression 
according to the nature(s) of the independent 
variable(s). 

(a) Model l regression : 

It is the regression of a dependent variable 
or criterion ( Y ) on an independent variable or 
predictor (X) which is a ‘fixed’ treatment 


variable (page 5). A model l regression i s u _ 
in predicting the values of Y for specific 
of X when the latter is varied by ,. Cs 
investigator at precise and predetcrmi n ,J 
manners and rates. The values of Y suffer tC 
errors due to random variations. But the va| u '!' 
of the ‘fixed* treatment variable X are free 
random errors , because they vary under 
planned and deliberate control of ^ 
investigator and not at random. So, model \ 
regression can estimate how much of j| lc 
variations of Y may result from the variations 
of X and can thus explore the causation of t| le 
changes in the dependent variable Y due to t| lc 
changes in the independent variable X, i. c 
their cause-and-effect relation . In a simpk 
model l regression , the lone predictor is a 
‘fixed’ treatment variable ; c.g., the regression 
of tracheal ventilation of an insect on ‘fixed’ 
and predetermined doses of an insecticide ; th e 
regression of blood sugar level on 
predetermined doses of injected insulin. In a 
multiple model / regression , based on .1 combi¬ 
nation of two predictors, both of the latter are 
‘fixed’ treatments. 

(b) Model ll regression : 

It is the regression of a criterion (K) on a 
predictor (X) which is a classification variable 
beyond the control of the investigator (page 5 ). 
It predicts the most likely value of Y on the 
basis of an already existing value of X in the 
individual — X is measured, but not applied, 
by the investigator. Because the values of the 
predictor are not ‘fixed’, controlled or applied 
by the investigator, its values suffer from 
random errors. Thus, both the dependent and 
independent variables vary at random and have 
random errors. So, model II regression cannot 
explore the cause-and-effect relation between 
the variables. It merely expresses a n 
interdependence of their changes, sometime'' 
due to common causes. The regressions ot 

cardiac stroke volume on venous return, 01 

% 


Scanned by CamScanner 



CORRELATION ANl) REGRESSION 


195 




rular filtration rate on glomerular blood those plotted points. 

g|onlC ,„ of tracheal ventilation of an insect on . _ . . 

nressure. u* 4 scpara t c regression equations may 

p .nheric temperature, and of examination , , , * r /v ", 

atmo s P nc . „ ,. .. . , .... , A be worked out for each pair of variables (X and 

in a language on the verbal ability test 1 


marks 


V). One of these is the regression equation of Y 


fTltfl*'*' . _ _ I /» , . Ml* V/IIW Vil UIVOL 10 UIL I VK.I VOOIV/U vviumivu • 

1 nf an examinee are examples of simple „ . . , 

score o[ * . . ,. llt „ c ft on predicting the Y scores on the basis of 


11 regression ; each uses a random , v , . . _■ „ 

model n t> the x scores ; the other is the regression 

A\r\or such as venous return, glomerular . , ’ ... . v _ , ^ 

predictor b equation of X on Y, predicting the X scores on 


blood pressure, atmospheric lemperature and ^ Qf ^ * scorcs Ho „ ever , lhe 

verbal ability- e u ols u ois ormu a or rC g ress j on 0 f that one of two correlated 

human body sur acc area is a mu rip e mo e variables is generally worked out. which is 

regression .of surface area on height and weight rcla<Wcly a * rc difficuU to measure or is 

of the indivi u . measured less precisely than the other. Thus, it 

(c) Model III regression : is more practical to work out and use the 

This is always a multiple regression regression of oxygen consumption on vital 
predicting the likely value of a dependent capacity than die other way around, because the 
variable from the given values of two or more ,0, "* r ic nnr ‘ '*** time - consuimntt t0 


predictors, some of the latter being ‘fixed* 
treatment(s) and the othcr(s) classification 
variablc(s) ; c.g., the regression of blood 
thyroxine level on combinations of atmospheric 
temperature (classification variable) and injected 
dose of thyrotropin (‘fixed* treatment). 


latter is easier and less time-consuming to 
measure than the former. 

5. The regression equation of Y on X is 
given by the following : 


Y = a rx + b yx X, 


or, 


Y + b YX (X - X ), 


Properties of simple linear regression 

1. Linear regression ofi a variable Y on the 
basis of scores of another variable X, or vice 
versa, can be worked out only when the two 
variables have a significant linear correlation , 

r YX or r xr 

2. The scattergram , resulting from the 

A 

plotting of the predicted criterion scores (Y) 
against the corresponding predictor scores (X) 
used in their regression, has a linear 
distribution with an upward or downward slope 
(page 34, Fig. 2.13). 


where Y is the Y score predicted in an 
individual on the basis of the actually measured 
X score of the latter, b YX is the slope and a^ is 
the general level of the regression line showing 
Y as a linear function of X. 

6. The statistic is a prediction statistic 
and is called the regression coefficient of Y on 
X. It is the average rate of increase or decrease 
in the score of the criterion Y. for unit rise in 
the score of the predictor X. It is given 
basically by the ratio of the covariance of 
scores of both variables and the variance of 
scores of the predictor. Thus, 


3. The Y values, predicted from those of X, 
lie around a straight line called the regression 
line of Y on X. The sum of squared vertical 
distances of the points, plotted with the paired 
scores of the variables, from this line is kept at 
a minimum ( method of least squares). In other 
words, it is the best-fitting straight line for 


h _ Cov(X.Y ) X(X-X)(Y-Y) 

™ Var(X) I(X-X) 2 1 




nXXY -XXIY . 

"IX 2 -(XX ) 2 ’ 


or, 


b YX = r YX 


Scanned by CamScanner 










1% 


STATISTICS IN BIOLOGY AND PS 1 ! CHOLOG'* 


7. The statistic is the Y-intercept of the 
regression line of Y on X. 

fl vx ~ Y ~ ^rx * • 

8. Likewise, the regression equation of X on 

Y may also be worked out. Where X is the X 
score predicted on the basis of a given Y score, 
b XY is the regression coefficient of X on Y and 
gives the slope of the regression line, and a XY 
gives the general level of that line, 

. CovfXJ) _ I(X-X)(Y-n . 
xy= Var(Y) l(Y-Y ) 2 


or, 


b xr = 


nlXP-IXIK 

nlY 2 -(lY) 2 


or, 


I) - r 

l} xr ~ 'xy Sy • 


(r X y = r w ). 


nr' 


<J xy = X - b XY Y . 


X = + b XY ). 


or, X = X + b n (Y -Y). 

9. These two regression equations as well as 
the two regression lines become identical with 
each other only when the product-moment r 
(viz., r XY or r n ) equals either +1.00 or -1.00. 
For all other values of r, the regression lines 
intersectjit a point which corresponds to the 
means (X and Y ) of the two variables. 

10. Tht angle between the two regression 
lines increases with the decrease in the 
magnitude of r and reaches 90° when r 
amounts to 0.00 - both b n and b„ become 0 
m value is such a case a„ d „„ re grcS5 io n „ 

1L The geometric mean of the two 
'egression coefficients equals the product- 
moment r between the two variables ? 


the product-moment r between the variables 
amounts to either + 1.00 or -1.00. 

12. The predicted score of the criterion (say 

Y) is merely its most probable score in ^ 
individual having a given score (say, X) of th e 
predictor. So, in any sample, all the individuals 
with a given X score are not expected to have 
their actual Y scores identical with the Y score 
predicted by regression on the given X score 
Rather, the actual Y scores, experimentally 
measured in all such individuals, will form a 
normal or near-normal distribution around the 
relevant Y score with the latter as the mean 

13. For all values of the predictor score X, 
the experimentally measured Y scores of the] 
criterion should be scattered to similar extents! 
around the regression line. In other words, the] 
deviations of observed Y scores from the 

respective predicted Y scores should obey 
homoscedasticity all along the regression line. I 


14. The difference between the predicted yl 
score and the observed Y scores of the criterion I 
is called the error of prediction. It rises with} 
the decrease in the magnitude of correlation! 
and is measured as the SE of estimate (.v wv ) of! 
Y on X. n 1 


S YX ~ ^ 


l(Y-Y) 2 


~ 1-ifo • 


r W or '*>•= 


J xr 


= J *)~ _ I —— 


c 

t 

Jt' 

| r 

k 

1/ 

I 


a> 

r 

|ii 

r 

\i 

I 


Evidently, when r yx amounts to +1.00 or 
- 00, s n amounts to 0 and all the observed Y 
scores coincide with the respective Y scores so I 
as to ie right on the regression line. On the 1 
contrary, when amounts to 0.00, equals! 
4 * and no Prediction is possible. 


Scanned by CamScanner 


















CORRELATION AND REGRESSION 


197 




tions for simple linear regression 

AsS0DJ . be reasonable to assume that : 
li sb 

variables involved in regression are 
(*) nfinuous measurement variables or 
c itbe r cfl ” rent jy discrete variables as can be 
such a l expected to yield continuous metric 
fX exploration ; 

bo th the variables have unimodal and 
^ symmetrical distributions in the 

population ; 

) the scores (19 of the dependent variable 
(criterion) is a linear function of the scores (X) 
of the independent variable (predictor) — in 
other words, there is a significant product- 
moment r between the variables and their 
scattergram has a linear distribution ; 

(d) the Y scores of the criterion, measured 
in a large number of individuals possessing a 
given X score of the predictor, arc distributed 
normally, independent of each other and around 
the predicted Y score as the mean, and the 
variances of all such distributions around the 
respective Y scores obey homosccdasticity ; 

(e) the predictor variable is either a “fixed” 
experimental treatment in model 1 regression, 
or a classification variable in model II 
regression. 


individual, and all such XY values are totalled 
to give LXY. 

(d) The statistic b )X is computed as follows 
using the sample size n (Example S.12.1). 

_ nT.XY -X.Vir 
«" n£X 2 -(XX) 2 * 

2. Using the sum of products : 

(a) The means (X and F) of the variables 
are computed using LX and LT. 


*=M : 


n 


- 

Y = *=— 
n 


( b ) The deviations of X and 1 scores from 
their respective means are computed tor 
each individual and then multiplied with each 
other to give (X-X)(f-F). These products 
arc added up to give the sum of products, 

Z(X-xkk-F). 

(c) The deviations of X scores from X are 
squared and the squared deviations are totalled 
to give the sum of squares of the predictor, 

viz.. I(X-X) 2 . 

(d) The regression coefficient is then 
computed as follows (Example 8.12.2). 


Kx-xxr-n 

X(X-X) 2 


Computation of simple linear regression 

The regression coefficient b YX for the 
regression of Y on X is computed by using an> 
of the following formulae. 

1. Using raw scores : 

(a) The X and Y scores of the variables are 
totalled separately to give LX and LI 
respectively. 

(b) Each X score is squared and all these 
squared X scores are totalled to give LX". 

(c) Each X score is multiplied by the 1 
score of the same individual to get \Y of that 


3. Using product-moment r : 

b YX may be computed using the product- 
moment r (ryx) and the SDs ( s x and s y ) of the 
variables (Example 8.12.3). 

. _ r iL 

°yx ~ y*s x ‘ 

4. Using covariance and variance : 

(a) LX, LX, LX 2 and LXX are worked out as 
in the steps (a) to (c) of the computation from 
raw scores. 

(b) Cov(X,Y) and variance (s^) of X scores 
are then computed using the sample size n. 


Scanned by CamScanner 







198 


STATISTICS 


PSYCHOLOGY 




IN BIOLOGY AND 


IXY X*I Y . 
Cov{X,Y) = ^-’ 

, ix 2 _(ixf 

*X~ n n 2 

(c) b YX is then computed as follows 
{Example 8.12.1). 


^rx ~ 


CovjX.Y ) 


2 


The b n computed by any of the preceding 
alternative methods is then used in working out 

a vx- 


%= Y ' b rx X ' 

The computed scores of and b }1( ^ 
then put in the following equation to giv e ^ 
regression equation of > on X. 

Y - Oyx + b YX X 

To draw the linear regression line, a nu mber 
of X scores are chosen at random from withi n 

the range of X in the sample. The Y score f 0r 
each of them is computed using the regression 

equation. Each Y score is plotted graphically 
against the corresponding X score (Fig. 8.4). 


Example 8.12.1. 

Work out the linear regression equation of height 
college students. 

Student : 12 3 4 

Height : 165 182 170 162 160 

Weight : 58.5 64 52 48.5 49.5 

Solution : 

1. First method using raw scores : 


(cm) on 

weight 

(leg). 

using the following data from 10 

6 

7 

8 

9 

10 

165 

170 

165 

176 

180 

59 

59 

58 

oo 

67 


(a) The height (Y) and the weight (X) scores are totalled to give LX and IK respectively (Table 8.24). 

(b) Each Y score is multiplied by the corresponding X score and all these products are totalled to give 


(c) Each A score is squared and the X 2 values are totalled to give LX 2 (Table 8.24). 


, _ »XAT -EXIT _ 10x97872.5-575.5x1695 
'll X*-(IX) 2 10 x 33439.75-(575.5) 2 ~ 




2 . Alternative method using covariance ’. 
(o)-(c) same as in the preceding method. 
(d) The covariance of X and Y 


scores and the variance of X scores are worked out and used i 


in computing 


Cov(X,Y) = I*L_X*XL _ 97872.5 575.5x1695 

n " 2 To foo-= 32-525. 


n~ 


(iiY. 

. 33439.75 

f 575.5 f 

V " ) 

10 

l io J 


•9725. 


b n = ^ J2.525 


31.9725 “ 1017 - 


Scanned by CamScanner 













CORRELATION AND REGRESSION 


199 


Computation of a n and the regression equation : 


3. 


(a) a n 


is computed using the b n worked out by any of the preceding methods. 


r “ ^ = TT “ 1695 ; 


X = — = = 57.55. 

n 10 


l YX 


- V - byyX = 169.5 - 1.017 x 57.55 = 110.97. 


(b) The values of a n and b n are used to give the regression equation. 

Y - a Yx + b YX X ; Of. Y = 110.97 + 1.02 X. 


Table 8.24. Table for computing regression coefficient from raw scores/covariance. 


Student 

Height (Y) 

Weight (X) 

XY 

X 2 


1 

165 

58.5 

96515 

3422.25 


2 

182 

64 

11648 

4096 


3 

170 

52 

8840 

2704 


4 

162 

48.5 

7857 

. 2352.25 


5 

160 

49.5 

7920 

2450.25 


6 

165 

59 

9735 

3481 


7 

170 

59 

10030 

3481 


8 

165 

58 

9570 

3364 


9 

176 

60 

10560 

3600 


10 

180 

67 

12060 

4489 


I 

1695 

575.5 

97872.5 

33439.75 



Example 8.12.2. 

Work out a linear regression equation of typewriting score on vocabulary test score using the following 


Individual 
Vocabulary score 
Typewriting score 


1 

8 

29 


2 

22 

48 


3 

35 

55 


4 

19 

45 


5 

23 

50 


6 

13 

35 


7 

2 

18 


8 

14 

38 


9 

11 

30 


Solution : 

1 . First method using the sum of products : 

(a) The means of vocabulary scores (X) and typewriting scores (Y), the deviations of y , n H v 
their respective means, the sum of products of the deviations of X and Y scores from 1 scores from 

and the sum of squared deviations of X scores from X are worked out (Table 825')^ rcSpeCtive meam - 


Scanned by CamScanner 














nrY AND PSYCHOLOGY 
STATISTICS IN BIOLOGY AN 


8 

22 

J5 

19 

23 

13 
2 

14 
11 


Y 

29 
48 
55 
45 
50 
35 
18 
38 

30 


x-x 

- 8.3 
+ 5.7 
+ 18.7 
+ 2.7 
+ 6.7 

- 3.3 
-14.3 

- 2.3 

- 5.3 


Y-Y 

T" 9J 

+ 9.3 
+ 16.3 
+ 6.3 
+ 11.3 

- 3.7 
-20.7 

- 0.7 

- 8.7 



(X-Xf 

68.89 
32.49 

349.69 

7.29 

44.89 

10.89 
204.49 

5.29 
28.09 

752.01 


(X -XKr-K) 
80. 

53.01 
304.81 
17.01 
75.71 
12.21 
296.01 
1.61 
46.U_ 

886.99 




147 


= 16.3 ; 


_ XK = 


= 38.7. 


y -ir- 9 

I( X- X ) 2 = 752.01 ; I(X- X XX- F) = 886 "‘ 

, f L« cum of products and the sum ol scjuarcs ol X . 
<« The regression coefficiem is complied using the sum of product. 


886.99 


Sum of products _ £(X-XM£- M __ 

~ Sum of squares of X £(X-X) 2 


r = 1.18. 


1XY. 


2. Alternative method using raw scores : 

(a) The X and Y scores are totalled to give IX and I)' respectively (Table 8.26). 

(b) Each Y score is multiplied by the corresponding X score and all such products are totalled to give 


( c) Each X score is squared and the X 2 values are totalled to give IX 2 (Table 8.26). 

Table 8.26. Table for computing regression coefficient from raw scores. 


X 2 


XY 


8 

64 

29 

232 

22 

484 

48 

1056 

35 

1225 

55 

1925 

19 

361 

45 

855 

23 

529 

50 

1150 

13 

169 

35 

455 

2 

4 

18 

36 

14 

196 

38 

532 

11 

121 

30 

330 

147 

3153 

348 

6571 


Scanned by CamScanner 









CORRELATION AND REGRESSION 


201 


'Y)C 


- The data of Table 8.26 are used in working out b. 

(fl) ***' * 

b n = = 9x6571 -147x348 


(I*) 2 9x3153-(147) 2 

3 Computation of a^ and the regression equation : 

v 


= 1.18. 


X = ^- = -iiZ_ = 163 . y_XT 348 

n 9 10 3 ’ T = — = — = 38.7 ; 

a rx = Y ~ b rx x - 38.7 - 1.18 x 16.3 = 19.47 ; 

A 

^ = q yx + ; or, . Y = 19.47 + 1.18 X. 


Example 8.12.3. 

rkom the°s£ 0 of'ty!' m; r ri: ' °" ^ <Uffercn ' il1 ap ’ ilu<fc " sl <DAT) ***' usin « lhe 


Compute the regression __ _ _ „ Iria 

following data, and work out the SE of estimate. 

N°. o f s, U denu = 64. _ Mean DAT score (X ) = 90.2 (s x = 28.35). 
Mean Maths marks (Y ) = 52.5 (* y = 2140). 


Solution : 

r yx = + 0.60 ; 

s x - 28.35 ; 


r n * + 0.60. 


n = 64 ; 
Y = 52.5 ; 


X « 90.2 ; 


s Y = 22.40. 


byx = r YX~ ~ 0 60 x — ^ ~ = o 47 • 
,A rA 5y m » 


28.35 


a nr = Y ~ b Yx x = 52.5 - 0.47 x 90.2 = 10.11. 
•• ^ = °yx + = 10.11 + 0.47 X. 

s yx ~ s y\/ l_f ro ~ 22.40^1-(0.60) 2 = 17.92. 

A 


- 

«• 

E 


*0-| 

70- 

60 

50- 


5 40 



30- (X=70.0. K=43.0) 


— 1 — 1 —i—r—i—i—i 

60 70 80 90 100 no 120 130 


A 

Two Y scores are computed, using two X scores 
chosen at random from within the range of X scores 1 "'* 5tor ” I*) 

X . M) p 1 °" ed a8ai " St ,he respec,ive * S “ reS F * Session Une o, Maths ma , ks on 

DAT scores. 

(a) Where X = 70, Y = 10.11 + 0.47 x 70 = 43.0 

(b) Where X = 110, Y = 10.11 + 0.47 x 110 = 61.8. 


Example 8.12.4. 

VVork out the Unear regression equation of oxygen consumption (Y ml ner m 

ml per minute) using the following data of a sample of ten insects Ut< ^ 0n ^beal ventilation 

* 

26 


Scanned by CamScanner 










« 


202 

STATISTICS IN 

BIOLOGY AND PSYCHOLOGY 


Individual : 1 

2 

3 4 

5 6 

7 8 

9 10 


Y ml : 3.0 

4.0 

3.3 2.8 

3.5 2.2 

15 4.2 

13 13 


X ml : 75.2 

86.0 

78.0 75.4 

84.5 65.0 

710 88.2 

71.4 68.0 


Solution : 







1. First method using raw scores : 





(a) X and Y scores are totalled to give LX and IY respectively (Table 8.-"'). 


(b) Each X score is 

squared and ail such X 2 values are totalled to give 

IX 2 . 


(c) The X score of each case is 

multiplied by the corresponding Y score and all such products are totalled 

to give IXY. 







Table 8.27. Table for computing regression coefficient using raw scores. 


Individual 

X 


Y 

X 2 

XY 


1 

75.2 


3.0 

5655.04 

225.60 


2 

86.0 


4.0 

7396.00 

344.00 


3 

78.0 


3.3 

6084.00 

257.40 


4 

75.4 


18 

5685.16 

211.12 


5 

84.5 


3.5 

7140.25 

295.75 


6 

65.0 


12 

4225.00 

143.00 


7 

72.0 


15 

5184.00 

180.00 


8 

88.2 


4.2 

7779.24 

370.44 


9 

71.4 


13 

5097.96 

164.22 


10 

68.0 


13 

4624.00 

156.40 


Z 

763.7 


30.1 

58870.65 

2347.93 



(d) The data of Table 8.27 are used in computing b iX . 

b = n 2XY-2Xl,Y _ 10 x 2347.93 - 763.7 x 30.1 
KV ~ n£X 2 -(XX)r ” 10 x 58870.65-(763.7) 2 


2. Alternative method using the sum of products : 

(a) The X and Y scores are totalled to give respectively IX and LY which are used in working out the 
respective means, X and Y (Table 8.28). 


* = 


I* 

n 


763.7 

10 


= 76.4 ; 



30.1 

10 


= 3.01. 


CD Deviations of X scores from X are worked out and squared, and these squared deviations are totalled 
to give the sum of squares, viz., I(X- X) 2 . From Table 8.28, 

Z (X-Xj 2 = 546.89. 


Scanned by CamScanner 













CORRELATION AND REGRESSION 


203 


Table 8.28. Table for computing regression coefficient using the sum of squares. 


^dividual 

X 

Y 

X- X 

(X - X ) ! 

Y - Y 

(X - X )(Y - Y ) 

1 

2 

75.2 

3.0 

- 1.2 

1.44 

-0.01 

0.012 

86.0 

4.0 

+ 9.6 

92.16 

+ 0.99 

9.504 

3 

78.0 

3.3 

+ 1.6 

2.56 

+ 0.29 

0.464 

4 

75.4 

2.8 

- 1.0 

1.00 

-0.21 

0.210 

5 

84.5 

3.5 

+ 8.1 

65.61 

+ 0.49 

3.969 

6 

65.0 

2.2 

-11.4 

129.96 

- 0.81 

9.234 

7 

72.0 

2.5 

- 4.4 

19.36 

- 0.51 

2.244 

8 

88.2 

4.2 

+ 11.8 

139.24 

+ 1.19 

14.042 

9 

71.4 

2.3 

- 5.0 

25.00 

- 0.71 

3.550 

10 

68.0 

2.3 

- 8.4 

70.56 

-0.71 

5.964 

l 

763.7 

30.1 


546.89 


49.193 


(c) Deviation of each Y score from Y is worked out and multiplied by the dev iation of the corresponding 
X score from X ; all such products are totalled give I(X-X )(Y- Y ). From Table 8.28, 

I(X-XKF-F) = 49.193. 


(i d) The sum of squares and the sum of products arc used in working out b YV 


Kx-xxr-n 

I(X-X ) 2 


49.193 

546.89 


= 0.09. 


3. Computation of a YX and the regression equation : 

ayx =7 - byyX = 3.01 - 0.09 x 76.4 = - 3.87. 

Y = flw + byyX ; or, Y - - 3.87 + 0.09 X. 


8.13 MULTIPLE REGRESSION 

Multiple regression, a method of 
multivariate statistics , predicts the most likely 
value of a variable (criterion or dependent 
variable) from the values of two or more other 
variables (predictors or independent variables). 
It can be computed only if the variables 
possess significant correlations with each other. 
It translates the relation between the criterion 
and a combination of two or more predictors 
into an expression showing the criterion as a 


function of the predictors. Multiple regression 
may be linear or nonlinear according as the 
criterion is a linear or nonlinear function of the 
predictors. Multiple regressions belong to 
models I. II and III according as all the 
predictors are ‘fixed’ treatments, or all are 
classification variables, or some of them are 
treatment variables and others classification 
variables. The Dubois-Dubois formula for 
predicting the human body surface area from 
height and weight is a model II multiple 
regression equation. 


Scanned by CamScanner 



















204 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Assumptions 

H should be reasonable to assume that: 

(«) all the variables involved are continuous 
measurement variables ; 

(/>) their scores have unimodal and fairly 
symmetrical distributions in the population ; ' 

(c) the paired scores of each pair of 
variables in an individual are independent of all 
other such paired scores in the sample ; 

(d) there is a linear association between the 
scores of each pair of variables ; 

(e) the actual scores of the criterion in a 
large number of individuals possessing given 
predictor scores are distributed normally , 
independent of each other, around the criterion 
score predicted from those predictor scores and 
all such distributions obey homoscedasticity. 

Multiple linear regression with three 
variables 

Such a regression predicts the most likefy 

value X, of the criterion X, from the given 
values of two predictors X, and X § The 
general regression equation for die straight line 


SDs (s v s 2 and s 3 ) of the respective variah, 
and the beta coefficients and /? 3 ). Cs 

fi 2 = . £ = ^ 1^23 . 

l -r 23 1 ~ r 23 


1 ~ r 23 

b l23 = A X 77 • ^13.2 = ^3 x iL ; 

2 s 3 

fl 1.23 ~ ~ ~ ^13.2^3 • 

The value of a |B , b lu and b m are lhc „ 
put in the preceding general formula f ()r 
regression equation to give the required 
regression equation. 

The SE of estimate (a, „) of X , on X, and 

*3 ,s computed using the multiple correlation 
coefficient /?, 

*1.23 = 12 +Al r l.l 1 


*1.23 = S^. 


15 as IO110WS : 

= a l.23 


+ X 2 + fc, 3i2 X 3 

b \23 and ^|3,2 are th e partial regression 
edeffiaemr, and a K th^ V • „— r S&™™on 

TTrr-r-the Y-interceptoTthe 

partial led out • b th,J • f When *2 is 
regression line of x onT* T ^ ° r Ule 

constant, and b ’ 3 When * 2 ls ke P l 

113 a ®, 3 2 are computed using the 


Other 3-variable multiple regressions, e g 

wo 8 £d° n ° <Xi 0n *• and »re similarly 
worked out using specific beta coefficients. 

variables^' 6 Mgr ‘ SSi0n W “ h “V 

invoTveT.r iS ,he ,0,al number of var *ables 
involved, the regression of X. on X X 

is worked out as follows : 2 . . 

* 

*' ~ ** 1.23 ...jn + b xlM ^ + b n 2A ^ + 


•••• b 




*’ X 1 = *1 + x 2 ) + AiL(X,-jf 3 ) + 

J 3 


m 


follows 5 fsin f e ^ llmate I s worked out as 
determination. § 1 6 COefficient of multiple 






Scanned by CamScanner 




















CORRELATION AND REGRESSION 


205 


panipl* 8131 .... 

u te the multiple linear regression equation of arithmetic reasoning test scores (X t ) on the 

C ° Uon of numerical operation test scores (X,) and mechanical knowledge test scores (Xj), using the 

co® bina . ^ a sample of 103 students, 

following aaiJ 


X, = 9 - 7 ’ 5 \ ~ 220 ’ 

Solution : 


X, = 44.0, r, = 8.80 ; 

r i3 


X 3 = 26.2, = 6.60 ; 


r \2 ~ + ® ^ » r 13 = + 0.20 ; r B = + 0.17. 


a r 12~ r l3 r 23 

Pl~ ,2 

l -r 23 


0.46 - 0.20 x 0.17 
1-0.17- 


= 0.44. 


P _ r i3- r i2>~23 _ 0.20-0.46x0.17 = 013 


\~rb 


1-0.17 2 


’ 12J = A^ = a44x lio =otl - 
2.20 


b n2 - - 013 x = 0 043 3 0 04 ' 


,3 = x, - fc 12J X 3 - b lu X> = 9.7 - 0.11 x 44.0 - 0.043 x 26.2 = 3.73. 

The partial regression coefficients and rt 1J3 are then used to compute the regression equatton. 

*1 =a,j3 + ^l2J X 2 + 6 IX2 X 3’ 

{ or, X, = 3.73 + 0.UX, + 0.04X 3 . 

R,. 23 = + = Vo.44 x 0.46 + 0.13 x 0^20 = 0.478. 


*1.23 


= = 2.20>/l -(0.4787 = 1.932. 


Example 8.13.2. standar d deviations of thyroid calcitonin content (X,), thyroxine 

10 3 ""ffxl S concentration (X 3 ) were found to be as foUows : 

secretion rate (X 2 ) ana seruu 

Xj — 395 mU/gland ; = 72.16 mU ; 

X =1.10 /yg/100 g bodyweight ; s 2 = 0.32 pg ; 

V = 8.2 mg/dL ; s 3 = 2.05 mg/dL. 

Th oduct moment r values between the respective variables of each pair were worked out as follows: 

r n = - 0.38 ; r, 3 = + 0.65 ; = - 0.12. 

Work out the regression equation of thyroidal calcitonin on the combination of thyroxine secretion rate 

and serum calcium. 


Scanned by CamScanner 












206 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Solution : 

(«) The beta coefficients are computed from the product •moment r values. 

„ '•u-'Tt'ii -0.38 - 0.65 x (-0.12) 

p, = -5- a -^- - B - 0.3 1 ; 

l-rj, I-(-0.12)- 


„ _ 0.65-(-0.28)x(-0.12) „ 

P, a --- a -5- = 0 . 61 . 

I -r|j l-(-0.12) 2 


(/>) The partial regression coefficients are computed using the beta coefficients and standard deviations 

72.16 


*>12.3 a frf » - 0.31 x 
“2 


w =-69.91; fti » 0.61 * ^ - 21.47. 


(c) The partial regression coefficients and the means arc used in computing <»,,, as follows : 

fl !. 23 <= *i - *> 12 . 3*2 - *> 13 . 2*1 = 395 - (- 69.91) x 1.10 - 21.47 x 8.2 = 295.85. 

(</) 1 lie multiple regression equation of X, on X 2 and X, may be written as follows : 

*1 = "1.23 + *> 12 . 3*2 + *> 13 . 2*3 • •*• *i * 295.85 - 69.9 IX ; + 21.47X,. 

(e) Hie SE ol estimate ( j ,^) is computed using the multiple correlation coerticient (/? ( M ). 

**1.23 = yT&n + /*J r u ■ V~9-3Ix(-0.38) + 0.61x0.65 ■ 0.717 ; 

*’• ^1.23 = W 1- **L23 a 72.16^1-(0.7|7) J a 50.30. 


GLOSSARY 

coefficient of determination : measure of that c 

associated with the variance of a correlated^ viable" ° ' * ‘° ,al Variancc of a triable, which is 

coefficient of nondetermination : measure of that nmnr>r.;« r u 

not associated with the variance of a correlated variable^ * * t0Ul Variance of a variable - which is 

contingency coefficient : a nonparametric correlation coefficient fm- . 

nominal variables divided into more than two classes correlating two discontinuous or 

correlation : statistics for establishing and estimating both the degree and rh „• • 

between two or more variables. " ® an d * be direction of association 

correlation, linear : correlation between u • 

giving a linear scattergram. S av,n 8 10 association conforming to a straight line and 

correlation, multiple : correlation between a given variable mH .h 

variables. able and the weighted sum of two or more other 

correlation, negative : correlation where the changes in the n 

the reverse changes of the other variable. ia gnitu e of one variable are associated with 

correlation, partial : correlation between two variables elimin ,• u 

variables correlated with them. ’ e,lminatln g ‘he effects of one or more other 


Scanned by CamScanner 



















CORRELATION AND REGRESSION 


207 


relation^ positive : correlation where the changes in the magnitude of one variable are associated with 

ct)I t hc changes, in the same direction, of the other variable. 

^relation, simple : correlation between only two variables. 

correlation coefficient : a statistic for establishing and estimating both the magnitude and the direction 
(algebraic sign) of the association between two or more variables. 

criterion : the dependent variable whose most likely value is predicted by regression in an individual or 
case, using the known value(s) of one or more predictors (independent variables) in the latter. 

honiosccdnstlcity : homogeneity of the variances of samples drawn from the same (or similar) population(s) 
so that such variances may serve as estimates of the same population variance and their differences 
can be explained away by their sampling errors. 

Kendall’s tnu : a nonparametric correlation coefficient for simple linear correlation between the ranks ot the 
individuals or cases with respect to the two corresponding variables. 

phi coefficient : a nonparametric correlation coefficient for correlating the classes of two genuinely 
dichotomous variables. 


point biserial r : a modified form of product-moment r for linear correlation between the scores of a 
continuous variable and the classes of a genuinely dichotomous variable. 

predictor : an independent variable whose known value in an individual or case is used for predicting in the 
latter the most likely value of a criterion (dependent variable). 

product-moment r : a correlation coefficient for simple linear correlation between the scores of two 
continuous variables with normal or near-normal distributions in the population. 

regression : statistical prediction of the most likely value of a criterion in an individual or case, using the 
known value(s) of one or more predictors in the latter. 

regression, linear : regression where the criterion has linear correlation(s) with the predictor(s). 

regression, model I : regression using “fixed” treatment variable(s) as predictor(s). 

regression, model II : regression using classification variable(s) as predictor(s). 

regression, model III : multiple regression using a combination of both treatment and classification 

variables as predictors. ^ 

regression, multiple : regression using two or more predictors. 

regression, simple : regression using a single predictor. 

regression coefficient : a prediction statistic which estimates and expresses the average rate of change in the 
value of a criterion for unit changes in the value of a predictor. 

Spearman’s rho : a nonparametric correlation coefficient for simple linear correlation between the ranks of 
the individuals or cases with respect to the two corresponding variables. 

standard error of estimate : a statistic estimating the differences of the predicted value of a criterion from 
the actual values of the latter in individuals or cases possessing a given value of the predictor. 

sum of products : sum of the products of deviations of the scores of two variables, in each individual case 
of the sample, from their respective means. 

sum of squares : sum of the squared deviations of the scores of a variable from a central value such as the 

mean. 


tetrachoric r : a correlation coefficient for correlating the classes of two apparently dichotomous or 

artificially dichotomized variables. 


Scanned by CamScanner 





, nonpakametric stat.st.cs 


Hiutes an analysis of frequencies 
feW therefore- consm ion for normality of 

Nonparametric ^atUtics requrre ^ ft (j) requires of variable(s), (i 0 

ssumptions, no estimate of p^ rf ^ ^ popd--^, statisd c as an estimate of 

----- -a "o normal d.stnnu uses no P KCgmy _ „„ rin n. (in) >s applicable 

parameter in J p and ( , v) can be used 

rSSSiS-'- — - 

*»is foe sum of therauos of W squared 


,ssumptions, no ^ 

tomputation. and no normal dtstnn 

variables in the population- 


,! NONPARAMETRIC STATISTICS 

Parametric statistics serve com pu- X~ IS 1116 S u^rved frequencies (fj from the 

he corresponding ^Zmpll sJ<ics deviations o o^ d ^ from a given 

in.etpre.ed with ***** *e , = 

distributions of ca nnot be * ft 

" 0 ^r.rlV nominal, ordinal ^ pos , tlvdy skewed probabiUty 

and discrete variables, and non-norm 

distributions. area under the X aismD , . 

Nonparametric (distribution-free) statistics give „ j f is taken as COO- Each X ^> r ' b “'“ 
(,-) require very few assumptions, (it) do not is asymptotic to the abscissa in t s g 
require normaldistributions of the variables in ils righl lai | reaches the abscissa ar + ~ only, 
the population, (Hi) do not use any f cann0 , have negative values. When </=■«, 
precomputed statistic as an estimate of ^ attribution is very little skewed ; indeed, it 
parameter in the computation, (iv) can be used a pp roac hes almost a bilateral symmetry at 
for even very small samples and (v) are 
computed by much simpler methods ; but (vi) 
their powers are lower than their parametric 
counterparts so long as the assumptions for the 
latter are fulfilled. 

Spearman’s and Kendall’s rank correlations 
as well as phi and contingency coefficients, 
discussed in chapter 8, are nonparametric 
statistics. Some other nonparametric statistics 
are described here. 

9.2 CHI SQUARE TESTS 

Nonparametric chi square (% 2 ) test explores 
the significance of deviation of an 
experimentally observed frequency distribution 
from a proposed frequency distribution and, 



v m squares 

Fig. 9.1. Some probability distributions of chi squares- 


208 


Scanned by CamScanner 








209 


NONPARAMETRIC STATI yrfiAs has three 

^ fan in * 

Thus.^the ^nctes W of cases 

<ss-s:*iss/JV- ««w”•”: e;... «««-.“,£ 

s?t s.’ss. ■f r,x;:? r~r*3tf»*; 

r 10 tdtb tbe specified # respectively, ts enter . these total 

sauare test of independence ma[gl nal column and Uve marg m«( 

j Cb> ex ist between two ftequenc ies are called frequency or 

A» value or quality, of ^ Bfnci e, Where n ts <he 

^'f U accompanied by a similar or size. 

one ranable ts Qf tbe other in the same I/ r . z/ c cell 

° P ' 0 h"Ii .Absent* of association is called the (fc) ^ expe cted frequency ^ ^ entere d 
^ V1 * a ence of two variables. A / * esr 0 ^ computed on the basis o 92 ) — the H 0 

indepe n 0 f association) explores . ^ corresponding^ 0* ~ . t w c 

"Ih *iher ornot wo variables are significandy Xpendenc* P™^* X 

tx.. .He 

;tpon between the vanables.Ju, k«» ® 

direction of the association, (if) nor predict a 4 sampling a „d would not have 

cause-and-effect relationship between . e if the entire P 0 Pu'ubon was ^ 

variables. i inslcad of a random sam ^ * the f values are 

The data consist of frequencies of paired ^ samp i e s i ze (n) constant, the f 

observations of two variables, often discrete or diiccay computed, according to the following 

nominal distributed in different combinations {onnuXa in ter ms of the H 0 , for only as many 

of their classes. X 1 # is computed m the chosen cells as the df of the X • 

following way. ^ - LL. y 

are^first'^rrange'd'in *a contingency table where f r and f c are marginal total frequencies 
showing the association between two variables of respectively the row »nd the column to 
in their combined distribution. A contingency which the given cell belongs. The f e of each of 
table is a two-way classification table with the the remaining cells is computed by subtracting 
classes of one variable arranged in r number of the already computed f t values from the 
tows and those of the other in c number of marginal totals. 

columns (Table 9.1). Each cell of the ta e (c) The difference (f o — f e ) is worked out foi 
resents a specific combination ot two each ceU ^ entere(i in ^ ^le. 

__ /•_ x. _;„uio Arrnrdin2 to the 

(d) Each such difference is squared to gi> 
the corresponding (f o - fj 1 . 


nts a specinc -- 

>es, one of each variable. According to e 
iber of rows and columns, i.e., the number 
classes of the variables, the table is an 
-fold table. For example, it is a 3 x 2-fold 


giv< 

i corresponding (f o - fj 1 , 

(e) Each of the latter is divided by th 

-i 


Scanned by CamScanner 












210 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


corresponding/ to get the ratio (f c ^J/ e 
The sum of all these ratios are used m w orking 

out X 2 - 




fe 


Yates’ correction : 

If any computed / is less than 5 and the X~ 
has the df of 1 only, Yates’ correction has to be 
applied. Yates’ correction brings each (f a -f e ) 
closer to zero by 0.5 — this means the 
subtraction of 0.5 from each positive (f 0 — f e ) 
and the addition of 0.5 to each negative 
(f o -f e ). The corrected (f 0 ~f e ) values are used 
for computing X 2< °ther words, 


X 2 =l 


fe 


where the bars on two sides of f 0 —f t indicate 
that all values of (f Q -f e ) are taken as positive, 
ignoring their algebraic signs. 

Alternative formula for fourfold contingent y 
tables : 

For the 2 x 2-fold contingency tables, the 
following alternative method may be used 
avoiding the computation of (/ -f f ) values. 

(a) In the 2 x 2-fold contingency table, the 
classes of the two variables are arranged 
exactly like that for working out the phi 
coefficient (page 181). Thus, the cell A at the 
top right corner and the cell D at the bottom 
left comer contain concordant cases belonging 
either to the high value classes of both 
variables (cell A) or to the low value classes of 
both (cell D) ; the cell B at the top left corner 
and the cell C at the bottom right corner 
contain discordant cases belonging to the hi-h 
value class of one variable and the low value 
class of the other, or vice versa (Table 9.3). 

(b) The / o values for different class 

cell's mT S T lhe " e " ,ered in respective 
"! h ' ”“ glnid ,otals “'so worked out for 


(page 181 and Table 9.3), Evidently, the/; fo 
the cells A and B amounts to the total of th c , 
cell frequencies, viz., A + B. and that f 0r J 
cells C and D is given by the sum of thej 
frequencies, viz., C + D ; similarly, the / f J 
B and D amounts to B + D % while the / f 0 | 
cells A and C is the sum of their frequency 
viz., A + C. 

(c) x 2 1S then computed using the product 
of pairs of cell frequencies (AD and BC) a nc 
the marginal totals. 

_ nUP-PO 2 

r ~ (A + B)(A + C)(B + D)(C T D)' 

df - (r - IXc - 1) = (2 - 1)(2- l) = 1. 

If it seems that any / may be lower than 5 
this is checked by computing the lowest / 
possible for any cell, using the smaller of the 
values and Uic smaller of the / values. 

lowest / * (smaller /) x (smaller /) 

When any / is found to be less than 5 
Yates’ correction is applied by changing th 
computational formula: 

. n(\AD-BC\-n/2) 2 

* ~ (A + B)(A + C)(B + D)(C + D) 

where the bars on the two sides of AD - Bl 
indicate that (AD - BC) is taken as positive! 
irrespective of its algebraic sign. 

Significance of computed x 2 • 

The Hq contends that the computed X‘ is 
not significant and has resulted from mere 
chances of random sampling. To test this H 0 by 
a two-tail test, the computed x 2 is compared to 
the critical x 2 with the computed df and for the 
chosen level of significance (a). Table C of 
Appendix lists critical x 2 values according to 
their df and a for two-tail tests. If th e 


— tail tests. 

computed x 2 exceeds or equals the critical 

- , _ »uai ioiais are also worked out fnr Xa(df) ’ the P robat> ility P of obtaining the 

each row or column and entered as die f or f C .° mputed X 1 b y chance is respectively lo* er 

4 ° r/ ' “d equal to a and may be considered k" 


Scanned by CamScanner 













nonparametric statistics 


211 


low (P $ <*)• ^ H o is 111611 rejected and the 
computed f is considered significant — the 
variables are then considered to have a 
significant association with each other. But if 
ihe computed z is lower than the critical 

Xl(df)' P cxceeds a and is to ° h igh to justify 
the rejection of H Q (P > or) ; the computed z 2 
is then not significant — the variables have no 
significant association with each other. 


For a one-tail z 2 test, a for a given critical 
V 1S half that of an identical critical z 2 for 
two-tail tests. In other words, the one-tail 
critical z 2 for a given significance level is 
identical with the two-tail critical z 2 for double 
that significance level. Thus, the one-tail critical 
Z 2 (df - 4) for 0.05 level amounts to 7.78 
which is identical with the two-tail critical 
X 2 (df - 4) for 0.10 level. 


Example 9.2.1. 

Using the data presented in Ifcble 9.1 about frequencies of gene crossovers between homologous 
chromosomes in Drosophila, bom of mothers of different ages, find if there is a significant association 
between the frequency of crossovers and the mother's age 


Table 9.1. 4 x 2-fold contingency table showing crossover data. 


Mother’s age 

(days) 

Crossovers 

<&> 

Noocrossovm 

C 0 

Total 

(O 

5 

247 

291 

538 

10 

152 

284 

436 

20 

174 

393 

567 

30 

140 

495 

635 

Total (f c ) 

713 

1463 

2176 (n) 

Solution : 

A two-tail z 2 test is applied for testing the H 0 
between crossover frequency and mother's age. 

which proposes that there is 

no significant association 


(n) The f 0 values are entered in a 4 x 2-fold contingency table (Table 9.2). The marginal totals (f and / > 
are worked out and entered. 

df = (r - lXc - 1) = (4 - 1X2 - 1) = 3. 

(b) The f t is calculated for each of as many cells as the df (viz., 3) and entered in Table 9.2 against the 
respective f o values. For example, for the f Q of 247 of the cell for crossover vs 5 days’ age. 


/ ■ M. = »;x7,3 . 1763 

J * n 2176 


The/, values corresponding to the/, values of 174 and 284 are similarly computed to be 185.8 and 
293.1, respectively. 

(c) The remaining / values are obtained by subtracting the sum of the already computed / values from 

either/or/. * * 

(<f) The (f 0 — f) value is computed for each f and entered in Table 9.2. For example, corresponding to 

*** 4 of 247, ' 

f 0 -f e - 247 - 176.3 = 70.7. 


Scanned by CamScanner 








212 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


(e) y} is then computed using each (f Q - and the corresponding f f . 

.. 2 -^ q -/.) 1 (70.7) 2 . (9.1 ) 2 , (-II. 8) 2 . l-ti.uf 1-70.7f 

Z L f e 176.3 142.9 185.8 208.0 361.7 293.1 

= 77.21. 

(/) Critical y} values (df = 3) for different levels of or are quoted from Tabic C of Appendix 

4*3) = 7.82 ; 4,0)= 1134 ■' Zlo 10 ) = 16 27 - 

The computed £ 2 of 77.21 far exceeds even the critical y} for 0.001 leveL So, the probability p ,,f 
getting the computed y} by chance due to random sampling is far lower than 0.001. Hence, the probahnljt-, 
is too low for the H 0 to be correct. So, the H 0 is rejected and the computed is considered significant — 
the frequency of crossovers, therefore, has a significant association with the mother’s age (P < 0.001). 


Table 9.2. 4 x 2 -fold contingency table for computing y z of crossover data. 


Mother’s 

age 


Crossovers 



Noncrossovers 

Sr 

So 

fc 

fo-fc 

So 

/, 

So-f t 

5 

247 

176.3 

+ 70.7 

291 

361.7 

-70.7 

538 

10 

152 

142.9 

+ 9.1 

284 

293.1 

- 9.1 

436 

20 

174 

185.8 

- 11.8 

393 

381.2 

+ 11.8 

567 

30 

140 

208.0 

- 68.0 

495 

427.0 

+ 68.0 

635 

fc 

713 

713.0 


1463 

1463.0 


2176 (n) 


(M 

381.2 'AfiO- 


Example 9.2.2. 

Responses of 48 normal humans and 57 neurotics to an item of a neurotic questionnaire are tabulated in 
Table 9.3. Find whether or not the test item differentiates significantly the normal humans from neurotics. 


Solution : 

A two-tail y test is applied to determine whether or not the test item has 
neurotic condition. 


a significant association with 



——- "MP-BC ) 2 _105 (20 x 20 - 28 x 37 1 2 

(A + BXA+CXB + DXC + D) - — 8x57x48x57 = 567 


df=(r- lXc - 1) = (2 - 1X2 - 1) = 1. 




Scanned by CamScanner 






















NONPARAMETRIC STATISTICS 


211 


The computed X 2 * s next compared with critical x 2 values ( df - 1) for different levels of signifi- 
r»\tC of Appendix) : 

^05<11 = • -^.02(1) = 6-41 "» ^oi(l) “ 6.64. 

Because the computed x 2 of 5.67 is higher than the critical x\v the probability P of getting tht 
uted X 2 by chance of random sampling is lower than 0.02. So, the computed x 2 >* significant .t 0 02 
c ° n1 . Thus, there is a significant association between the test item and the neurotic condition (/' ' 0 02). 
Hence, the test item significantly differentiates the normal from the neurotic. 


Example 9.2.3. 

Out of 15 hypertensive human subjects exposed to cold during a cold pressor test, 4 reacted with 
10-20 mm Hg rise in the diastolic pressure (normoreactors), 9 showed more than 20 mrn llg ri c 
(hyperreactors) and 2 reacted with less than 10 mm Hg rise (hyporeactors). Out of 25 nonhyperten r-- 
subjects similarly treated, 12 were normoreactors. 5 were hyperreactors and 8 were hyporeactors Is there any 
significant association between the cold pressor reaction and hypertension ? 

Solution : 

A two-tail x 2 test * s applied for testing the H 0 which contends that there is no significant association 
between “hypertension and cold pressor reaction. 

(n) The f 0 values arc entered in a 3x2-fold contingency table and the margin.>l tot >i </ m<l / ) > 
computed and entered therein (Table 9.4). 

df= (r- l)(c - 1) = (3 - 1X2 - 1) = 2. 


Table 9.4. 3 x 2-fold contingency' table for cold pressor reaction data. 


Cold pressor 
reaction 


Hypertensive 

Nonhypertensive 

4 

fo 

/. 

fo~f t 

fo 

4 

4-4 

Normoreactor 

4 

6.00 

-2.00 

12 

10.00 

+ 2.00 

16 

Hyperreactor 

9 

5.25 

+ 3.75 

5 

8.75 

-3.75 

14 

Hyporeactor 

2 

3.75 

- 1.75 

8 

6.25 

+ 1.75 

10 

Total (f c ) 

15 

15.00 


25 

25.00 


40 (n) 


(b) The f e values are calculated for as many cells as the df (viz.,2) and entered in the table. For example, 
for the f o of 9 in the cell for hypertensive vs hyperreactor. 


f_X 14x15 
J '~ n ~ 40 


5.25. 


In a similar way, the f e corresponding to the f o of 2 in the cell for hypertensive vs hyporeactor, is 
computed to be 3.75. 

(c) The/ f values of the remaining cells are then obtained by subtracting the sums of the already 
computed f e values from either f r or f c . 

Id) The (f o - f e ) values are computed for each f o and entered in Table 9.4. For example, for the / of 9, 

/ -/ = 9 - 5.25 = + 3.75. 

j o J e 


Scanned by CamScanner 












214 STATISTICS IN BIOLOGY AND PSYCHOLOGY 

(*) One / e value, viz., 3.75, is lower than 5 ; but df> I. So, Yates’ correction is not applied. 
Z 2 


_ T a ~ X )~ = (- 2 - 00) 2 . ( 3 - 75) 2 (- 1 - 75) 2 + (100)1 + + 1^1 

^ £ 600 + * 7 * ^75 10.00 8.75 6._5 


5.25 


= 6.66. 


(f) The critical values (df = 2) for different levels of a. are quoted from Table C of Appendix. 

' *1X2) = S M • *in> = 1X1 '• 4lO) = 9 - 21 - 

The computed x 2 of 6.66 is higher than the critical x~ f° r 0-05 level. So, the probability P of getting the 
computed x 2 by chance due to random sampling is lower than 0.05. The computed x~ is, therefore, 
considered significant. Thus, hypertension has a significant association with the cold pressor reactioi 
(P < 0.05). 


y 


Example 9.2.4. 

Out of 15 diabetic subjects, 8 were found to be suffering from hypercholesterolemia while the rest had 
normal serum cholesterol. Out of 10 nondiabetics. 2 had high serum cholesterol while the rest had normal) 
serum cholesterol level. Is there any significant association between hypercholesterolemia and diabetes ? 

Solution : 

1. First method avoiding (f 0 -/) : 

(a) The/ o values are entered in a 2 x 2-fold contingency table and the marginal totals (f r and f c ) are also 
computed and entered therein (Table 9.5). 

<//= (r - l)(c - 1) = (2 - 1X2 - 1) = 1. 



Nonhypercholesterolemic (f 0 ) Hypercholesterolemic (f Q ) 

Total (f r ) 

Diabetic 

7 (B) 8 (A) 

15 (A + B) 

Nondiabetic 

8 ( D ) 2 (C) 

10 (C + D) 

Total (f c ) 

15 + D )10 (A + C) 

25 (n)_ 


to a f n/ '‘ 0WCr 5 -* “mpmed for the cell correspond!* 

to the lower values of both/ r and/ c . Thus for the cell C in the present case, 

f _ frfc _ 10X10 
J <-~ ~ = 4 - 

the ^ A$ df = l> * nd one of the cells has “> /« lower than 5, Yates’ correction is applied in computing 


~ -— -n/ 2 ) 2 _ 25(|8x8-7x2|-25/2) 2 

(A+B)(A + C)(B + D)(C + D) 15x10x15x10- = L56 - 

2. Alternative method using (f — . 

(o)-nte / 0 values are entered in the relevant columns of a 2 xo fold . J 

a 2 x 2-fold contingency table and the marginal 


Scanned by CamScanner 




























NONPARAMETRIC statistics 


215 


iouI> (4 and 4> nrc a,SO Workcd oul and entered therein (Table 9.6). 

{r ~ r Hc- 1) = (2- 1X2-1)= 1. 

(/») As tlf d ' c 4 r ‘ U ’^ onc eell** is worked out using the relevant f and f values and entered 
in (he table, hor example, for the cell for nondiabetic vr hypercholesterolemia e 


f - 44 10X10 

J * n ~ 25 = 4 ' 

(f) I be/, °l each of the remaining cells is worked out by subtracting the already computed / values 
from either f r or / f . 6 ' 

(i!) As <// = 1. onc /, va 'nc (viz., 4) is less than 5. Yates ' correction is applied to all the tf -f t ). To 

do this, 0.5 Is added to each negative (f - f r ) and subtracted from each positive (f m -/) to give the 
respective “corrected <f o - f t )" values (Table 9.6). 


Tuble 9.6. Fourfold contingency table for diabetes data, using (f 0 -/ # ). 




NonhypcrcholcMcrolcmic 


Hypcreholestcrolemic 

Total 


4, 

fr fo-fr Corrected 

/. 

4 4*4 Corrected -/,) 

(4) 

Diabetic 

7 

9 -2 -1.5 

8 

6+2 + 15 

15 

Nondinbctic 

8 

6+2 + 1.5 

2 

4 - 2 - 1.5 

10 

Tbtul (f # ) 

15 

15 

10 

10 

25 (/») 


(<■) The squn i (f a - f r )" values are used io working on 


* ! = E 


J * 

corrected (/„ -/,)* 

fr 


(-1.3 (L3) 2 (15? (-1.51 2 

9 6 6 4 


Interpretation : 

To compare with the X 2 worked out by any of the preceding methods, the critical /} values (cif = 1) for 
different levels of a are quoted from Thble C of Appendix. 

*io5tn = 3,84 • *j02<i> = 5 41 ; = 6,64 ‘ 

As the computed r is lower than even the critical r Q5 > die probability is too high that the computed 
has been obtained by chance due to random sampling (P > 0.05). Hence, the H 0 cannot be rejected. So. the 
computed x 2 is not significant — there is no significant association between diabetes and 

hypercholesterolemia. 


2. Chi square test for goodness of fit 
This test is used to explore how tar a 
distribution of observed frequencies (f 0 ) f” lts 
with a theoretical distribution such as the 
normal distribution, a binomial distribution, a 
Mendclian phenotype distribution, and a 
distribution proposed by the hypothesis of 
equal probability. The f t values are computed 


here on the basis of the proposed theoretical 
distribution. The x 2 computed from the (f o -/) 
values proves significant if it equals or exceeds 
the critical x~ for the chosen level of 
significance ; in such a case, the / distribution 
differs significantly from the proposed 
distribution. A computed X 2 lower than the 
critical x 2 indicates that the f o distribution fils 


Scanned by CamScanner 



















m 


216 


STATISTICS IN 1HOI-OOY AND l*SY< H0UX1Y 


with the proposed distribution and does noi 
differ significantly from the latter. 

The classical formula, based on () 
values, is used in computing the ^ 2 . 

yl _ y jjazjsl 

Z -2. fr ■ 

The alternative formula, avoiding the use of 
f e values, cannot be applied as no contingency 
table is involved. Yates' correction (page 204) 
has to be applied if the f t of any class falls 
below 5 and the df amounts to 1 only. 

Degrees of freedom : 

The df of the computed x 2 depends in this 
test on the nature of the proposed theorctic.il 


distribution and is limited to the number ofj 
classes in the distribution whose ftrtiurn. ■ 
arc free to vary. Distributions ItKc (■ 
Mendrlian 9:3:3:! phenotype distrlbuti„ n 
and the equal probability hypothesis distribute 
do not involve any parameter ; so, only o nc ,// 
is lost in keeping the sample si/.c (/i> const^ 
— for such distributions. <// ■ A - I. where k h 
the number of classes in the distribution. fo r 
distributions involving estimates of onc or more 
parameters, the df is further lowered by |J 
number of parameters involved. For exampfl 
for a normal distribution, df amounts to k -fl 
because three degrees of freedom arc lost in 
keeping n, p and a constant ; for binomial <in( | 
Poisson distributions , df amounts to 
bccousc n and p arc to be kept constant. 


Example 9.2.5. 

hmi whether or not the oh erved distribution of temm iron concentration, present, .i m / xamole wl 
differs significantly from the normal Hm sibaOtm ; ' '• 

Solution : 

obscre " and ™—* - * ~ fts cri r 1 


-- — -"* ^ ° f fil ° f ™ *» -rm in , „o rma , d , slnbull0 „. 

Classes 


4 


100-109 

110-119 

120-129 

130-139 

140-149 

150-159 

160-169 


6 

11 

10 

17 

16 

13 

7 


3.5 

8.4 

14.7 

18.4 

16.4 

10.5 
4.8 


l °~ f ' ( 4-/,) 2 v . - fjw 


Total 


+ 2.5 
+ 2.6 
-4.7 
- 1.4 
-0.4 
+ 2.5 
+ 2.2 


6.25 

6.76 

22.09 

1.96 

0.16 

6.25 

4.84 


1.786 

0.805 

1.503 

0.107 

0.010 

0.595 

1.008 


ih\ . . -- 5.814 

For each class, the values of (f - r s (f , 2 ' --- 

respective columns of Table o? p, v ° ~f*>- and If - f >2 /* a 

6 91 ’ For sample, for the class 130-139 ' kcd ° Ul and cn,crcd ,n 


Scanned by CamScanner 























NONPARAMETRIC STATISTICS 


217 


, ,8.4 = -1.4; V .-// = (- Uf -= 1* s 

f.'f‘ 


I f.-f.T _ 125 . = 0.107. 

-* IS.4 


/, 


Yates’ correction is not used inspite of some/, values being l QWer than - 

3 = 7-3 = 4 . 

the number of classes and three of them lose their freedom tor variation u ke v 
***** L the normal distribution. 

constant io» 

(c) From Tabic 9 . 7 f 

X* - « 5 *»• 

* lhe critical values Vf - 4, for afferem levels of a « quoted ftom Ublc C of AppenMx. 

= 9.49 ; ri»« - >'«•• riwi " 13 28 ; ■ 18 46 

AS the computed *» is lover than the cnttcal r the iKst-fltlmg normal 

« Mm*- 1 -*™ <ff> .." ... 


Example 9.2.6. 

Crossing a grey-bodied scarlet<ycd DnwfM* with a bbek-bodied red-eyed one produced all grey¬ 
bodied red-eyed tl.es in the F, generation. On dotting the F, Hies, the F a jUM ition g«VJ " ’ *» 

phenotypes : gley-bod.cd .ed eyed - 362 ; bbek-bodfed ted-eyed - toM K " ^ ^ ct ‘ - >- 2 • 

black-bodied scarlet-eyed = 44. Do the data have a goodness of fit with the Mendelian 9 , 3 > I 

distribution ? 


Solution : 

(a) The / of each phenotype is entered in Table 9.8 and the corresponding /, is computed on the basis 
of the 9 : 3 : 3 : 1 distribution so that the /, values of phenotypes conform to the respective proportions in 
that distribution. For example, for the grey red phenotype, 

9 Q 

n = 362 + 128 + 122 + 44 = 656 ; f e = ^ +3+3+1 x n = — x 656 = 369. 

(b) For each phenotype, the values of (f a -f e ), (f a and (f o -f e ) L lf e are computed and entered in the 
respective columns of Table 9.8. For example, for the black scarlet phenotype, 


4-4 = 44-41 = 43; Cf.-// = 3’ = 9; ^£ = -^=0.2195. 

(c) From Table 9.8, 

z * =1 Lklfit= 05637 . 

where Jt is the number of phenotypes, and one df is lost in keep in g n 

28 


consum for the Mendel™ dr^nhiuon. 


Scanned by CamScanner 











218 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 
Table 9.8. Compn^^o^f^^ 

- f -f <f. -f* r h r ' 



Phenotypes 

Grey red 
Black red 
Grey scarlet 
Black scarlet 

Total 


« Critical X 1 val-s W = 3) *r different levels of significance are quoted ftom Table C of Append 

= 7 A-n. “ 9M '■ Ziw *“ 1427 ' 

As the computed # of 0.5637 is far lower than the critical r for 0.05 level, ** computed X' » no, 
significant (P > 0.05). So, there 1 * difference between the obset button and the 

9:3:3:1 distribution — the observed frequency distribution of phenotypes has a significant goodness offi, 
with the proposed Mendelian distribution. 


Example 9.2.7. 

Crossing of a purple eyed uaight-winged Drosophila with a red-eyed cu inged Drosophila 
produced dihybrid red-eyed straight-winged females to the F, feneration. On croaaiung inch ! fesm dcs with 
double-recessive purple eyed curved-winged males, the F, feneration showed the following phenotype 
distribution : red straight - 539 ; puiplc straight » 612 ; red curved - 725 ; purple carved = 384. Red eyes 

and Straight wings arc dominant facu.is (A and B respectively) wh,ie purple eyes aod curved wines are 
recessive factors (a and b respectively). 

Solution : 

1 2 

1. X test for independent assortment : 

It is a x~ test for goodness of fit Thr» u nn,, . 

would not have differed from 9:3:3:1 if the entire'rcoular ' ' * rall ° ° f frequencies in Ihe F 2 generation 
^ a two-factor independent assort * * -*• ™« 

W *•* phenotype is entered in Table 9.9. * * “ “ foUoWS - 


Phenotypes 

Red straight (AB) 
Ruiple straight (aB) 
Red curved (Ab) 
Purple c urved (ab) 

Total 


Table 9.9. Computation nf y 2 f ft • . 

--- - onof-^ for todependem assortment. 



_4 

1158.75 

386.25 

386.25 

128.75 

2060.00 


fo~f e 
19.75 
+ 225.75 
+ 338.75 
+ 255.25 



Ls 

i 

56>° 

<fo -fM 

671990.06 

579.93 

50963.06 

131.94 

114751.56 

297.09 

65152.56 

506.04 

1515.00 cr» 


Scanned by CamScanner 










NONPARAMETRIC STATISTICS 


219 


r t he proposed 9:3:3:1 distribution, the proportions of the total frequency in tht 
(« ln ‘Xnotypes should be as follows : AB = 9/16, aB = 3/16, Ab = 3/16. ab = 1/16. These 
fC sp^ tive P e multiplied by the sample size n to give the respective f e values. For example, or <- 
propod' ons 
phenotype 

„ = 339 + 612 + 725 + 384 = 2060 ; f e = x 2060 = 1158.75. 

, For each phenotype. (f„ -/,). V.-tf and are computed and entered in the table, bo. 

example, for the aB phenotype, 

f = Ix 2060 = 386.25 ; /.-/,= 612 - 386.25 = + 225.75; 

J ‘ 16 

if.-Li 50963.06 _ ,,, Q4 
(f 0 -pi = (225.75)* = 50963.06 ; " * 


^2 _ = 1515.00 


(from Table 9.9) ; 


df=k - 1*4 -1*3, 
because 1 df is lost in keeping n unchanged. 

The . . . value...// » J ted ftom Table C of Apr 

= 7.82 ; “ ' 6 ' 27 ' 

As the computed r of 151 h • , 

the 1 SSS ion doer no, obey the Mendeha, 
assortment. • 


This is a test of independMce^^H„jroi»s«hemthat Aere a^TepLtle from 

hi mife^onSf basfs, arises from mere chance. To test this H 0 . ? is computed as follows. 

(n) The observed frequencies (f„> are arranged in a fourfold contingency table (Table 9.10) and X 2 « 
computed using the cell frequencies and marginal totals (f r and/ c ). 


Table 9.10. Fourfold contingency table for linkage. 



b 

B 

Total <f r ) 

A 

725 (B) 

339 (A) 

1064 (A + B) 

a 

384 (D) 

612 (C) 

996 (C + D) 

Total (f c ) 

1109 (B + D) 

951 (A + C) 

2060 (n) 


n(AD-BC) 2 

2060 (339x384-725 x612) 2 

= 181.17. 

x - 

(A + B)( A + C)(B + D)(C + D) 

1064 x 951x1109 x 996 


Scanned by CamScanner 













220 STATISTICS IN BIOLOGY AND PSYCHOLOGY 

■ (r - l)(c - I) ■ (2 - 1)(2 - 1) = 1. 

Critical x 2 values ($' ■ l) are quotod from Table C of Appendix. 

* 5.41 * *ioid) s ^.64 » Akd s 10.83. 

As ihe computed far exceeds the critical X’ for the 0.001 level, the computed x 2 i s highly signifi Can( 
if' < 0.00 Ik So, the H 0 of nonlinkage may be rejected — there is a significant linkage between the tw 0 
relevant genes. 


Examfde 9 . 2 .S. 

Out ot a total of 288 unsuccessful candidates in an examination, 104 failed in English, 106 i 
Mathematics and 78 in Bengali. Test whether the results diverge significantly from the expectation that T 
equal proportion of unsuccessful examinees would fail in each subject (a = 0.05). 11 


So/uriivi : 


Tte an equal proportion of unsuccessful examinees should fail in each subject, is a hypothesis of m ,„i 
pndstMtly ol failure. According to it. out of a total of 288 unsuccessful examinees. 288/3 or 06 should M 
in each subject. So, for each subject, f t amounts to 96 (Table 9.11). 



X 2 = = 5.084 


df=k- 1 = 3 - 1 = 2, 


because one of 3 classes loses its freedom to vary its f if n is tn he v , 

us/ o n n is to be kept constant. From Table C, 

* a = 0 05 : critica l *05(2) = 5.99. 

the observed distribution of ^ 1 ^ 0 ^^l^'three wb^T/** ^ “ n0t signiflcanl < P > 0 05). 
expected on the hypothesis of equal probability. U JeCb d ° es div *rge significantly from 


Example 9.2.9. 

*°sm£S^ «* —» of male infants 

(» = 00?" "" S,8nifiC ™'>' *- »•», is expected fromme^d “ D °« ° b * 

* • 1 sex-ratio of human inlaj 



Scanned by CamScanner 







NONPARAMETRIC STATISTICS 


221 


Solution ■ 


pro! 


Sex 
ibability 


being a dichotomous variable divided into two classes, binomial distribution may serve as its 


distribution. 


Males 
per sample 

Number of 

samples (f 0 ) 

Pe 

fe 

fo-fe 

v .-/.) 2 

r>_ 

*■4 „ 

i ^ 

' T" 

6 

0.03125 

5 

+ 1 

1 

0.20 

4 

23 

0.15625 

25 

- 2 

4 

0.16 

3 

46 

0.3125 

50 

-4 

16 

0.32 

2 

55 

0.3125 

50 

+ 5 

25 

0.50 

1 

27 

0.15625 

25 

+ 2 

4 

0.16 

0 

3 

0.03125 

5 

- 2 

4 

0.80 

Total 

160 (A0 


160 



2.14 


(a) According to the assumed 1 : 1 sex-ratio, p = q- 0.5, where p and q are the proportions ot males 
and females respectively in the population. The total number (AO of samples drawn is 160. As each sample 
consists of 5 infants (n = 5), the successive terms of the following binomial expansion ot (p + q)" give the 
relative expected proportions or probabilities (p e ) of obtaining respectively 5, 4, 3 , 2, 1 and 0 males in each 
sample. 


(P 


. n(n -1) „ f , n(n ■ 
+ q) n = jf + np n ~ ] q + lx2 '/ " V + —j 


w( n~lXw _ 2) ^ n _3^ 3 + . + qri' 


x2x3 


or, (0.5 + 0.5) 5 =(0.5) 5 + 5 x (0.5) 4 x 0.5 + y^T x (0.5) 3 x (0.5) 2 + x x (0 5)3 

5x4x3 x2 x Q5 x (0 5)4 + (o5)5 
1 x2x3x4 

= 0.03125 + 0.15625 + 0.3125 + 0.3125 + 0.15625 + 0.03125. 

Each p e thus calculated is entered in Table 9.12. 

(b) The expected frequencies (f e ) of males in the samples are then computed by multiplying the p e 
of each class with the total number of samples (N = 160). For example, the f e for 3 males per sample 
amounts to : 

p e N = 0.3125 x 160 = 50 (Table 9.12). 

(c) The differences (f a - f e ) between the observed and expected frequencies are computed and squared. 
These squared differences, (f a - f e Y' l ^ us wor ked ou h ^ then used in computing x 2 - (Table 9.12). 

X 1 = - 2.14 ; a = 0.05 ; df = k - 2 = 6 - 2 = 4, 

because there are 6 classes (k = 6) and two df are lost in respectively keeping N unaltered and estimating the 

population proportion. 

Critical ^os( 4 ) “ ^.49 (Table C of Appendix). 

The computed % 2 of 2.14 is far lower than the critical v* n 1 i « , . , 

significant (P > 0.05). So, the observed distribution doe1 e the com P uted -T is 

assumed 1 : 1 sex-ratio. “ dl & er sl S>"ficantly from that expected on the 


5x4x3 


Scanned by CamScanner 



























222 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


9.3 G TEST 

This nonparametric test is frequently 
preferred to the chi square test for the analysis 
of frequencies. Like the chi square test, the G 
test is applicable irrespective of the normality 
or non-normality of distribution of the given 
variable in the population. It is applicable even 
to nominal and ordinal variables and to small 
samples. For large samples, the log likelihood 
ratio statistic (G) has sampling distributions 
closely similar to those of x 2 • So, the 
computed G is compared with critical x~ va l° es 
for testing its significance. 

G test for goodness of fit 

The G test is used for finding whether or 
not there exists a significant goodness of fit 
between an observed frequency distribution and 
a theoretically expected distribution such as the 
normal, binomial, Poisson, Mendelian 9:3:3:1. 
and equal probability distributions. The H ( 
contends that the observed distribution docs not 
differ significantly from the proposed 
distribution, and that any difference between 
the two distributions can be explained away by 
mere chances of random sampling. The 
probability P of this H Q being correct is then 
worked out as follows. 

(a) On the basis of the proposed hypothesis, 
the expected frequencies (f e ) are calculated for 
the different classes of the distribution in the 
same way as in case of the x 2 test (§ 9.2). The 
computed f e values would thus conform to the 
proposed distribution, viz., a normal, binomial, 
Poisson or Mendelian distribution, as the case 
may be. 

(b) Using the observed frequencies (f o ) and 
the corresponding expected frequencies {f ) t the 
statistic G is computed with a calculator having 
the log # (ln) key. 

G = 2 l(/>g,j)- 


Yates' correction : (0 If the distribution haj 
only 2 classes (k = 2) and the sample »i/. c ,1 
lower than 200 (n < 200). Yates'correction h a l 
to be applied to avoid an upward bias in qj 
computed G. For this, 0.5 is subtracted frnrj 
each f a higher than the corresponding f t wh,| J 
0.5 is added to each f 0 lower than i(,J 
corresponding f t — the Yates correction j J 
thus intended to bring each {f 0 - f e ) closer to o| 
by 0.5. If Yates’ correction has been applied I 
the corrected f o values should be used, instead! 
of the original f 0 values, for computing cl 
(ii) For distributions with more than 2 classes I 
(Jk > 2), any class with f t less than 5 should he I 
combined with an immediately adjacent class) 
so as to raise their combined f t to 5 or above. I 

(c) The degrees of freedom of the computed I 
G are calculated in the same principle as in the I 
case of z 2 (P®g* 210) — one df is lost for ] 
keeping n unaltered and one more df is lost for I 
each parameter involved in the expected 
distribution. So, where k is the number of I 
classes in the distribution, the df will amount to I 
(k - 2) for binomial and Poisson distributions,] 
(k - 3) for normal distributions, and (k - I) for j 
distributions involving no parameter (e.g.,| 
Mendelian phenotype distributions, equal I 
probability distribution, etc.) 

(d) The computed G is next compared with 
critical x 2 values having the computed df. If I 
the computed G exceeds or equals the critical ■ 
X~ for a chosen level of significance, the I 
probability P for the H () being correct is I 
considered too low and the computed G is I 
significant (P ^ a) — observed and expected I 
distributions then differ significantly and there I 
is no significant goodness of fit between the I 
distributions. If, on the contrary, G is lower I 
than the critical \ ,2 , it is not significant (P > $ I 
— there is a significant goodness of f Jt I 
between observed and expected distributions I 
and they do not differ significantly. 


Scanned by CamScanner 










NONPARAMETRIC STATISTICS 


223 


9.H- 


1 ,,r C y-bodied scarlet-eyed Drosophila with a black-bodied red-eyed one produced al grey 

CrO**iM ° i rtj cg j n the first filial (F,) generation. On crossing these F, flies, the F ; generation gave 
i je j red . .. 0 f flics of different phenotypes : grey-bodied red-eyed : 362 ; black-bodied re -t\ t 
(° ,loWing hodied scarlet-eyed : 122; black-bodied scarlet-eyed : 44. Apply G test to explore whe er ere 
\2&< ^ o 0 ( jnei* of fit between the observed F 2 phenotype distribution and the Mende tan 

II contends that there is no significant difference between the observed distribution o 2 
Tb vpc^ and the proposed Mendelian 9:3:3:1 distribution. A two-,ail G test is undertaken to find the 

Piility P of this H 0 being correct. 

(a) The observed frequencies (fj in different phenotypes are tabulated in Tabic 9.13. 

(h) In terms of the Mendelian 9:3:3:1 distribution, the proportions of the total frequency (») ‘ n th 
respective phenotypes should be : 

grey-red : 9/16; black-red : 3/16 ; grey-scarlet : 3/16 ; black-scarlet : 1/16. 

The sample size (n) is multiplied by these proportions to give the respective f t values ot the phenotypes. 
For example, for the black-red phenotype, 

n m 656 (from Table 9.13) ; 


/, * " x T7T * 656 x TTi = 123 ' 


16 16 

(c) Using a calculator, the fjf„ and f 0 x It *fjf t ) stilucs arc worked out for each phenotype 

and entered in Table 9.13. For example, for the black-red phenotype. 


fjf = -yg = 1.0407 ; In (fjf) * ln(l.0407) = 0.0399 ; 
f 0 * In (fjf) = »28 x 0.0399 = 5.1072. 

Tabic 9.13. Table for computing C for goodness of fit with Mendelian distribution. 


Phenotypes 

fo 

fe 

fjf. 

In (fjf.) 

L X In a.//.) 

Grey-red 

362 

369 

0.9810 

- 0.0192 

- 6.9504 

Black-red 

128 

123 

1.0407 

0.0399 

5.1072 

Grey-scarlet 

122 

123 

0.9919 

-0.0081 

- 0.9882 

Black-scarlet 

44 

41 

1.0732 

0.0706 

3.1064 

Total 

656 (n) 

656 



0.2750 


G-2 I [/, Hfjf)] = 2 x 0.2750 = 0.55. 


Because the proposed distribution is a Mendelian distribution, df = k - l = 4 _ j _ 3 
Critical x 1 values (df = 3 ) are quoted from Table C of Appendix, 


£oi<3) “- ' -♦ 4.10(3) = 

As the computed G is lower than the critical X 2 for 0.05 level, the probabilitv P of u ^ 
is considered too high and H 0 cannot be rejected (P > 0 05) So the mm *a r* • ^ )ein ^ co 

ficant and it is inferred that there is no significant difference ° “ COnsidered n0t > 

and the Mendelian 9:3:3:1 distribuUon ; in other words there k ° bserVed P hen °type distribi 

two distributions. ’ here 1S a significant goodness of fit betv. ee 


1 1 » 


Scanned by CamScanner 









224 


STATISTICS IN BIOLOGY ANU PSYCHOLOGY 


Example 9.3.2. 

Find whether or not the observed distribution of serum iron concentrations. presented in Example f> 4 , 
differs significantly from the normal distribution 

Solution : 

(a) Expected frequencies (f ) of a best-fitting normal distribution arc first computed In the present tav . 
that has already been clone stepwise in Example 6.4.! and presented in Table 6 page ic o iscrvctj 

and expected frequencies of that table are entered in Table 9.14 for further computations 


Table 9.J4. 'fable for computing G for goodness of fit with best-fitting normal distribution. 

Classes 

fo 

/, 

fjf. 

1nt/V/r) 

f 0 x In (fjf) 

100-109 

6 }l7 

, 35 |l 1.9 

1.428571 

0.356675 

6.063475 

110-119 

11 / 

8.4/ 




120-129 

10 

14.7 

0.680272 

- 0.385263 

- 3.852630 

J 30-139 

17 

18.4 

0.923913 

- 0.079137 

- 1.345329 

140-149 

16 

16.4 

0.975610 

- 0.024692 

- 0.395072 

150-159 

13 1 
- }20 

l0 '\ 115.3 

1.307190 

0.267880 

5.357600 

160-169 

7 / 

4.8/ 




' Total 

80 (ft) 




5.828044 


(b) Two of the seven classes, viz., 100*109 and 160*169. hove f t scores less than s I u li of these two 
classes is, therefore, combined with the next class The final number of elates thus comes to five only 
k * 5. 

(c) Using a calculator. fjf„ \n(fjf r ) and/ # ln(/7/,) values are calculated for each class and entered in 
Table 9.14. The sum of all the f a In(///,) scores is then used in computing G. 

G = 2l|/„ln^j = 2 x 5.828044 «= 11.66. 

Because the proposed distribution is a normal distribution, df — k — 3 = 5 — 3 = 2 
Critical 2" 2 scores : 

*05(2) = 5 99 : Zlzat = 7 82 » *jo 1(2) = 9.21 ; = 13.82. 

As the computed G exceeds the critical r 2 for 0.01 level P < oot Thm cr 

computed G is significant. So, there is a significant difference between the oh ^ re i CCted and 
normal distribution W " between the observed distnbution and the 


O lest of independence or association 
G test can be used to find whether or not 
two variables have significant association with 

* C . 0t r r 1016 H ° conIends hcre that there is 
no signd.cant association between the variables 
and they arc independent of each other 


(a) A contingency table is constructed 
arranging the classes of one variable along its 
rows, and those of the other variable along the 
columns. This gives the rxc-fold contingency 
table with r number of rows for one variable 
and c number of columns for the other 


Scanned by CamScanner 











NONPARAMETRIC STATISTICS 


225 


The observed frequencies (fj of cases, 
t£) specific combinations of classes of 
teJooP 0 ' . ' aje then entered in the respective 
*** contingency table (Table 9.15). The 
cel^ ° f score s of cells of each row or 
total o 0 cnter ed at the margin of the table as 

^rtinal total, f r or f c respectively, of that 

de 0 ‘ vs , „ 

or column. 

v .. tl, f f and /, values are used in 
(ft) ine ^ . , 

inputiog G wilh a calcu,ator having the 
£» key - 

G = 2[w>4>- W,K) 

♦ I(f, In4)1 + n In n]. 


# = (r- lKc- 1 ). 

Yates’correction : In case of the 2 x 2-fold 
contingency tables. Yates’ correction should be 
applied if the sample size is lower than 200 
(n < 200) (Example 9.3.4). For this, 0.5 is 
added to the f 0 values of cells B and C of the 


contingency table, and is subtracted from the f 
values of cells A and D each, if the product o 
f values of A and D exceeds that of B and C. 
i!e., if AD - BC is positive (Table 9.17). On 
the contrary, if AD - BC is negative, ts 
subtracted from the f Q values of B and C, an 
added 10 those of A and D. The corrected J 
values are then used, instead of the otigina > 
observed f values, in computing G by t 

J O 

preceding formula. 

(c) The computed G is next compared with 
critical z 2 values having the given df. If the 
computed G exceeds or equals the critical X 
for the chosen level of significance, the P 
of the H 0 being correct is considered too low 
(P ^ or) — the variables under consideration 
then possess a significant association with each 
other. If, on the contrary, the computed G is 
lower than the critical X 2 . the variables do not 
have any significant association with each other 
and arc independent of each other (P > a). 


i 


Example 9.3.3. 

40 out of 100 diabetics and 25 out of 120 nondiabetics suffered from high blood pressure (hypertension) 
while the others qf both groups were free from hypertension. Is there a significant association between 
diabetes and hypertension ? 

Solution : 

A two-tail G test is undertaken to find the probability P of correctness of the H 0 proposing independence 
of the variables. 

(fl) The data are arranged in a 2 x 2-fold contingency table (Table 9.15). The cell frequencies (f) of each 
ro* are totalled to give the f r of that row ; the f Q values of each column are similarly totalled to give the f 
«that column, it = lf r = lf c = 220 . . c 



^ " VdUeS iS converted t0 lo ^ value usm S a calculator with an In key, and 


29 


Scanned by CamScanner 




226 STATISTICS IN BIOLOGY AND PSYCHOLOGY 


(c) These values arc then used in computing G. 

C = 2 [ In /„) - (Itf, In f r ) * Kf, In /,)! + /. In n] 

= 2 [ 60 x 4.09434 + 40 x 3.68888 + 95 x 4.55388 + 25 x 3.21888 
- |(I00 x 4.60517 + 120 x 4.78749) + (155 x 5.04343 + 65 x 4.17439)) * 220 x 5-39363 J 
= 9.644. 

df= (r- l)(c- 1) = (2- 1)(2 - 1) = 1. 

Critical values (df = 1) arc quoted from Table C of Appendix. 

« 10.83 ; 2 ri„„ = 6.64 ; = 3.84. 

As the computed G is higher than the critical f for 0.01 level, the probability P of the // 0 being corrt 
is considered too low. So, there is a significant association between diabetes and hypertension (P < 0.01 

Table 9.16. Table for G test of independence of diabetes and hypertension. 



Nonhypertensive 

Hypertensive 

fr 

,n 4 


Jo 

In f 0 

4 

In 4 

Diabetic 

60 

4.09434 

40 

3.68888 

100 

4.60517 

Nondinbetic 

95 

4.55388 

25 

3.21888 

i to 

4.78749 

ft 

In / f 

155 

5.04343 


65 

4.17439 


220 (n) 

5.39363 (In n) 


Example 9.3.4. 

20 out of 48 normal persons and 37 out of 57 neurotics gave positive responses to a test item of 
neurotic questionnaire while others of each group gave negative responses to that test item. Is there 
significant association between neurotic state and response to the item ? 

Solution : 

A two-tail G test is undertaken to find the probability P of correctness of the H 0 which contends that tf 
test item has no association with neurotic state. 

(a) The data are arranged in the cells. A, B, C and D, of a 2 x 2-fold contingency table (Table 9.17 
The cell frequencies (f 0 ) of each row are totalled to give the f r of that row. The f o values of each column ai 
similarly totalled to give the f c of that column. 


Table 9.17. 2 x 2-fold contingency table and corrected f o scores for neurotic questionnaire data. 



Negative response 

Positive 

response 

Total 


4 Corrected 4 

fo 

Corrected f 0 

Jr) 

Neurotic 

20 C B) 20.5 

37(A) 

36.5 

57 

Normal 

28 (D) 27.5 

20 (C) 

20.5 

48 

Total (4) 

48 

57 


105 («) 


■ 


Scanned by CamScanner 




















NONPARAMETRIC statistics 


227 


/4k Because it is a 2 x 2-fold contingency table with n <r ->oo v , • 

, L Here, the product of/ scores of cells A and n k ~ X ' *“ rorn?cno ' 1 ,s applied to all the f 
is subtracted from the/ of each of ^ , f“ !° of «** B C. i.e.. .AD > 

B and C to get the corrected / scores Taw!^^ WhUe 05 lS added to the 4 of «** of 
^sequent computations. 1 corrected f o scores are used for the 

ic) Tbe log. values of n and of each of the f f ^a 

{Otered in Table 9.18. These values are then used £ 4 “* W “ tad ° Ut ^ 3 Cakulat ° r and 

T^le for G test of association between response to test item 
and neurotic state, using corrected /, values. 



= 2 [20.5 x 3.02042 4- 36.5 x 3.59731 ♦ 27.5 x 3J1419 ♦ 20.5 x 3.02042 

- {(57 x 4.04305 ♦ 48 x 3.87120) ♦ (48 x 3.87120 ♦ 57 x 4.04305)) ♦ 105 x 4.653961 
= 4.804 J 


df= (r - lKc - 1) = (2 - 1X2 - 1) = 1. 

Cnncal z 2 values (df= 1) are quoted from TUbfe C of Appendix. 


f *joio) = 6-^4 ; 

As the computed G exceeds the critical 
there is a significant association between 


“ 5.41 , 2To3(|) = 3.84. 

for 0.05 level. it is considered significant below that level. So, 
neurotic state and response to the given test item (P < 0.05). 


9.4 WILCOXON SIGNED RANK TEST 

This is a nonparametric alternative to 
Student s t test for finding the significance oj 
difference between paired observations of 
single-group and matched-pair experiments It 
can be applied to both continuous and discrete 
variables irrespective of the normality or non- 
normality of their distribution in the population, 
e\en to small samples. For its application, 
ov, e\ er, it should be justifiable to assume that 
scores or observations occurs at 
om in the sample, independent of all other 
aus * ^ us lest is less powerful than the t test. 

^0 contends that there is no significant 


difference between the scores of each pair of 
observations, and that any observed difference 
results from mere chances of random sampling 
The probability P of this H Q being correct is 
found out m the following way. 

Computation : 

(a) The difference between the scores of 
each pair of observations is worked out. All the 
non-^ero differences are then assigned ranks in 
an ascending order, according to their absolute 
magnitudes but ignoring their algebraic signs If 
two or more differences have the same absolute 
value, irrespective of their algebraic signs each 
is assigned an average rank which is the m 


Scanned by CamScanner 








228 


STATISTICS IS BIOLOGY AND PSYCHOLOGY 


of the actual ranks that would have been 
assigned to them if the y were not identical oc 
consecutive. In Table 9.19, for example, the 
differences +2.4 and -2.4 are both considered 
simply as 2.4 and each of them is assigned the 
average rank of 9_5 which is the mean of the 
ranks 9 and 10 that they would have occupied 
if they differed in their absolute values- In such 
cases, the difference having the next higher 
absolute magnitude is given that rank winch it 
would have occupied if the difference 
immediately preceding it held a separate 
individual rank instead of an average rank. 

( b ) The original algebraic signs of the 
differences are next assigned to their respective 
ranks. 

(c) The ranks with identical algebraic signs 
are then added together This gives two sums of 
ranks for positive and negative ranks 
respectively. The smaller of these rank sums, in 
absolute value irrespective of signs, is taken as 
the statistic T. 

Significance : 

The significance of the computed T is found 
out in the following ways. The closer are the 
values of the absolute rank snms, i.e., the 
larger the absolute value T of the smaller rank 
sum, the higher is the probability P of the H fj 
being correct 

(0 For small samples with 50 or lower non¬ 
zero differences between the paired scores 
(n ^ 50), the computed T is compared with the 
critical T values (7^) for the given n and for 
different levels of significance (Table F of 
Appendix). The significance level of the 
computed T is given by that of the critical T r 


value which equals or exceeds the compute > 
— the lower the latter, the higher i s - 
significance. In other words, the observe 
diff erence between the sample means 
considered significant (P <: a) only when ^ 
computed T is equal to or lower than the 7* 
for the chosen level of significance 0 j! 
probability ( T 0 £ T). There is no signify' 
difference between the sample means (P > u 
if T exceeds Tq. The significance level f 0r . 
one-tad test is half that for a two-tail test. 

(u) For large samples with more than 25 
non-zero differences (n > 25), the computed 7 
is transformed into Student’s / score, using the 
mean sum of ranks (T ) and the SE {Sf ). 

- i*n+I) _ _ /n(/j + 0.5Xn+l7 

t = —j— . *r = V-12-: 



The computed / is next compared with 
critical / scores (df = *>) for different levels of 
significance (Table B of Appendix ). The 
observed difference between the sample means 
is considered significant only when the 
computed t exceeds or equals the critical / for 
the chosen level of significance (P ^ a). 

Inaccuracies : 

The signed rank test suffers from two 
inaccuracies : ( 1 ) average ranks are used for 
tied differences instead of their true ranks ; (li) 
the differences given consecutive ranks may 
differ from each other to different extents in 
magnitude. In Table 9.19, for example, 
differences ranked as 6 and 7 differ in 
magnitude by 0.6 while those ranked as 7 and 
8 differ by 0.1. 


Example 9.4.1. 

. B1 °° ? he ”“* l0bm ™ l0 “ <* P« dL) of ten penurious anemia patients respectively before and after 

sxs ffirasx-sr of Tab ’ 2 9,9 - F “ d “ r: '" — 105 


Scanned by CamScanner 





















nonparametric statistics 


229 


Solution : 

A two-tail signed rank test is unicnzken . 

(a) The data are tabulated in Table 9 19 The differ«^ nr v v . 
observations is worked out and entered in the tabte **“ ** ° f *** pair of 

ascending order according to their absolute vah^s. ienorinsr their 1 crences *** asstgncd ranks ,n an 
differences ; e.g.. the bed differences. + 2.4 " g«^n to the tied 

^ eac “ assigned the average rank of 9.5, ignoring 

differences. + 2.4 and - 2.4. each given ^k^ToT^,!’ ^ rcs l K ' ,, « • «•*» lh ' 

_ 9 i respectively ^ aVtT * ?c 9 - 5 * t* 0 "* hw the signed ranks of + 9.5 and 


Table 9.19. Signed rank test of hemoglobindata. 


Hemoglobin (g dL‘*) 



Absolute sum of positive ranks : 9 J + 1 + 2 = \ 2 5 

Absolute sum of negative ranks: 8 + 7 + 6 + 9.5 + 3.5 + 3.5 + 5 = 42 5 

T = the smaller sum of ranks = 1 ? S 

(d) The computed T is compared with two-tail critical T valuer for th* • r 
“ "» s “>pl« (« = 10). From Table F of Append^ 0 “* * n “ mber of n °*'- 2 «° <Merences 

a 0.02 : T 0 = 5 ; a 0.05 : T 0 = 8 ; a 0.10 : T 0 = 11. 

toerelsTtt ^ ^J 0 005 level of Seance, the computed T is 

"“"TO with S 12 (p > 0.05). SP‘fi can ' ‘'fference between the hemoglobin values before and after 


Exar »ple 9.4.2. 



after folate theraDv are^°^ (hlCH in pg) of 26 macrocytic anemia patients, respectively before 

t * Jera Py has produced ci„n * F ‘ eiei ; led m ^ 2nd and 3rd columns of Table 9.20. Find whether or not the 

produced significant changes in the MCH values. 


Scanned by CamScanner 












M() 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


* Solution : 

The tf 0 contends that the MCH values do not differ significantly between the two groups, a 
rank test is undertaken to find the probability P of correctness of this H 0 . 

(«) ’Ilte dill etc nee (.Y, - X 2 ) between the scores of each pair is worked out and entered in the table 
non /cm differences air assigned ranks in an ascending order according to their absolute values i 6 
their signs, Average ranks air given to the tied differences ; e.g., the tied differences, viz, +2, +2 and^^ 8 
each given the average rank of 3 ignoring their signs. ^ 

(i>) 1 he original algebraic signs of the differences are then assigned to their respective ranks • e 
differences, viz.. >2, +2 and -2, each given the average rank of 3, now bear the signed ranks of +-t J the 
-3, respectively. w ~' + ~ and 


Patient 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 
11 
12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 


Table 9.20. Signed rank test of MCH data. 


MCH values 


before (X.) 


34 

38 

27 

40 

35 
32 
30 
•il 

28 

24 

35 

36 

35 

30 

36 

25 
22 

41 
32 
38 
38 
40 
28 
15 

31 
28 


after (X,) 




Signed ranks 


31 

+ 

3 

"71 

30 


8 

+ 16.5 

33 

- 

6 

- 10.5 

30 

+ 

10 

+ 22 

25 

+ 

10 

+ 22 

30 

+ 

2 

+ 3 

28 

+ 

2 

+ 3 

34 

+ 

7 

+ 13.5 

30 

- 

2 

- 3 

10 

+ 

14 

+ 25 

31 

+ 

4 

+ 6 

30 

+ 

6 

+ 10.5 

25 

+ 

10 

+ 22 

29 

+ 

1 

+ 1 

26 

+ 

10 

+ 22 

30 

- 

5 

- 7.5 

30 

- 

8 

- 16.5 

34 

+ 

7 

+ 13.5 

24 

+ 

8 

+ 16.5 

30 

+ 

8 

+ 16.5 

32 

+ 

6 

+ 10.5 

34 

+ 

6 

+ 10.5 

10 

+ 

18 

+ 26 

25 

— 

10 

-22 

22 

+ 

9 

+ 19 

33 

- 

5 

- 7.5 

added to give two rank sums, omitting the algebraic signs of 


Scanned by CamScanner 











NONPARAMETR1C STATISTICS 


231 


Absolute sum of negative ranks 
Absolute sum of positive ranks 


10.5 + 3 + 7.5 + 16.5 + 22 + 7.5 = 67. 

5 + 16.5 + 22 + 22 + 3 + 3 + 13.5 + 25 + 6+10.5 
+ 22 + 1 + 22 + 13.5 + 16.5 + 16.5 + 10.5 + 10.5 + 26 + 

= 284. 


— n(n + l) 26(26+1) .pcs 
j - the smaller rank sum = 67. T = —^— = 3 

|»i(n + 0.5Kn + D _ 1 26(26+0.5)(26+W _ 3g 3? 

s r = \ 12 = V 12 

, _ IzL = 67 ~ l7 H = - 2.756. or. 2.756. <// = «• 

jjf 39.37 

Critical / scores (from Table B of Appendix) : 

r 05M = 1.960 ; = 2.326 ; = 2.576 ; f.oo.<-> = 3.291. 

As the computed t of 2.756 (ignoring its sign) is higher then the oritk >1 ! of J 
is significant betow the 0.01 U here u a significant difference between the MCH values of 

groups — the therapy has produced significant changes in MCH (P < 0.01). 


9.5 WILCOXON COMPOSITE RANK 

TEST OR RANK SUM TEST 

This is a fairly good nonparametric 
alternative to Student’s t test for tinding the 
significance of difference between unpaired 
observations of two independent groups. It can 
be applied to both continuous and discrete 
variables, irrespective of their normal or non¬ 
normal distributions in the population, and even 
to small samples. But for its application, it 
should be justifiable to assume that each score 
occurs at random and independent of all other 
scores. It is somewhat less powerful than the t 
test, but has a higher efficiency than the t test 
in case of non-normal distributions. 

The Hq proposes that there is no significant 
difference between the scores of the two 
independent groups, and that any observed 
difference has resulted from mere chances of 
random sampling. The probability P of this H Q 
being correct is found out as follows. 

Computation : 

(a) All the scores of both the groups, taken 


together as a composite group, are ranked in a 
single ascending order of their values. Two or 
more identical or tied scores, whether belonging 
to the same group or to separate groups, are each 
assigned an average rank which is the arithmetic 
mean of the actual ranks that would have been 
given to those scores if they were consecutive 
scores instead of being identical. The score next 
higher than those of a tied set is assigned the 
same rank as it would have got if the tied scores 
had separate consecutive ranks instead of an 
average rank. 

C b ) The ranks of the scores of each group 
are next totalled separately. This gives two 
sums of ranks, 7, and T 2 , for the two groups. 
As the H 0 contends that there is no significant 
difference between the scores of the two 
groups, the ranks should be evenly distributed 
between the groups to make T, equal to T 2 if 
the H 0 is correct. 

Significance : 

The significance of the rank sums is 
explored in one of the following ways. 


Scanned by CamScanner 














IN BIOLOGY AND PSYCHOLOGY 


i 


232 


STATISTICS 


i 


(i) For small groups of unequal sizes (n ( 

3 to 25, n 2 = 3 to 50), the sum of ranks of the 
smaller group is taken as the statistic T and 
compared with the critical upper and lower 7 
values ( T u and T, respectively) for the given 
combination of group sizes (N, M) and the 
chosen level of significance (Table G of 
Appendix). If the computed T is found to lie 
between T u and i.e., T u > T > T p the H 0 
cannot be rejected and the means of the two 
groups do not differ significantly (P > a). If T ( 
exceeds or equals the computed T, i.e., T t ^ T, 
the mean of the smaller group is significantly 
smaller than that of the larger one. If T exceeds 
or equals T u , i.e., (T ^ TJ, the mean of the 
smaller group is significantly higher than that 
of the larger one. 

00 For small groups of equal size (n, = n, 
< 25), the smaller of the two sums of ranks is 
taken as the statistic T and compared with the 
critical lower T value ( T,) for the given 
combination of group sizes and the level of 
significance chosen for the two-tail or one-tail 
test as the case may be. If T, % T, there is a 
significant difference between the scores of the 
two groups (P ^ a). 

(m) For large groups of equal size (n, = n, 

> 20), the critical T a values are computed for 


different levels of significance, using the 




rank sum (7), its SE (if ), and the total St74 
of the two groups (N - /i, + n,). 


^ _ 7 : •¥ T : _ N(N+l). Rf 

T ’•’-ITT- 

For two-tail tests . T g values are given 

T g — 7 — t qi^Sj , thus, for 0.05, 0.01 and 0 
significance levels, 

T&s ~ T _~ ~ T _~ 2-^3s?r ; 

To, = T — 2-58if , T QOl = 7 — 3.29^r 

Similarly, for one-tail tests, 

F 05 = 7 — 1.65jj? ; fjjj ■ T — l.96.sjr ; 

To i = T ~ 233sf ; ^oos - T - 2.58,^. 

The H 0 is rejected and the observed 
difference is considered significant if the 
computed T, i.e., the smaller sum of ranks. » 
equal to or lower than the T a for the chosen 
significance level. 


Inaccuracies : 

This test suffers from two inaccuracies : (/) 
average ranks are used for tied scores instead 
of their true ranks ; (it) no consideration is 
given to the fact that the scores having 
consecutive ranks may differ from each other 
to different extents in magnitude. 


Example 9.5.1. 

significantly lower notntal peions than ^ " L * MCH 

Solution : 

The H 0 contends that the MCH is not significantly lower in__ 1 

patients. A one-tail composite rank test is undertaken m fi a ,u °[ mai men than in macrocytic anemia 

/1D - ._.. “ to find the probabihty P of this H n being correct. 

(a) Ranks are assigned in an ascending order to the MCH , i o s 

ranks are given to tied values (Table 9.21). For example h** ° f ^ groups taken to S &heT ' avera?e 
each group, is given an average rank of 4 5 which k rh* each r of two tied scores of 28, one occurring in 
have got if they were nor tied. "* me “ of «Panue ranks 4 and 5 that they wonU 


Scanned by CamScanner 








NONPARAMETRIC STATISTICS 


233 


Table 9.21. Table for composite rank test of MCH data. 


Normal 

Scores Ranks 


Macrocytic anemia 
Scores Ranks 



T, = 55 


T 2 = 155. 


(b) The ranks of each group are totalled separately to give the respective rank sums • • T 

As the two groups differ in size, the rank sum T, of the smaller gmup (n, = 9) ts taken as the stattstte T. 


(c) The computed T is compared with one-tail critical T m and T, values fur the given combination of 
group sizes, viz., n, = 9 and n ; = II (Table G of Appendix). 

a 0.005 : T u = 128 ; T, = 61. 
a 0.025 : T u = 121 ; T, = 68. 

,K°on .h*t nf the lareer croup (anenuc) (P < 0.005). 


The strengths of the kneejerk reflex ^ ^'tod'^luo^rfTtble 9.22. Does the mean 

^ e,oups ? 


Solution : 

A two-tail composite rank test is applied. ^ groups taken together ; average ranks 

<a\ Ranks are assigned in an ascending order to scores of 35> one occurring in each 

mnks ,8 and .9 M they would have got 

group, is given the average rank of 18.5 wruen 

if they were not tied. the respe ctive rank sums : 7, = 129 ; T 2 = 81. 

(», The ranks of each gremp are totalled sum of ranks, viz.. T v is taken as the stanst,c T. 

As the samples are of equal size (n t n 2 

T * 81 . 


BB 

Scanned by CamScanner 



























234 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


(c) Two-tail critical T ( values (n, - n 2 = 10) for different significance levels are quoted from T^ki 
Appendix. 1,1 la °le Q 0 J 


a 0.01 : T, = 71 


Kneejerk strengths 


a 0.05 : T, = 79. 

Table 9.22. Table for composite rank test of kneejerk data. 
AUlletes Nonathletes 


Ranks 


Kneejerk strengths 


Ranks 



Because T > T f even at 0.05 level the P of th» u ^ 
significant tiifferencc between the fa*** OsT' ^ ^ H 


Example 9.5.3. 

The memory test scores of two "mime r.f 

of Table 9.23. Is there any significant difference beto<^ *** *** th “ d coIumns - respectively 

cn tne mean memory scores of the two groups ? 

Solution : 

A two-tail composite rank test is undertaken. 

(fl) Ranks are assigned in an ascending order to the crr.ro «■ u 
average ranks to the tied scores (Table 9.23). CS ° the 8 rou P s taken together, giving 

(b) The ranks of each group are totalled separately to give the ~ 

T 2 - 315.5. The smaller of the two rank sums, viz T is taken ® specUve su ms of ranks : T. = 587.5 
equal size (n, = « 2 = 21). • T = 315.5. * ^ statis ^ T because the griups are ol 


Scanned by CamScanner 










mwparametrjc STATISTICS 


235 


Tsb(e 9,23- Tafcle far composite rank lest of memory scores. 


' Group A 


Group B 


ftfcaucy score* r Aj 

Ranks 

Memory scores OCff 

Ranks 

' 20 

153 

21 

173 

33 

333 

18 

12 

35 

37 

17 

10 

34 

35 

7 

l 

25 

213 

9 

3 

38 

41 

18 

12 

25 

213 

20 

153 

27 

23.5 

23 

19 

24 

20 

33 

37 

31 

29.3 

31 

293 

33 

333 

16 

9 

37 

40 

14 

8 

32 

313 

11 

4 

30 

28 

12 

5.5 

29 

26 

12 

53 

19 

14 

27 

23.5 

18 

12 

35 

37 

21 

173 

32 

31.5 

29 

26 

13 

7 

36 

39 

29 

26 

39 

42 

8 

2 

Total 

5^5 


315.5(7 2 ) 


fa) Because the samples are large (n, = n, > 20), the two-tail critical T a values are computed as follow 

far Afferent leveiv 


N = n l + n J = 21 + 21 = 42. 


=- _ tf(Af+l) _ 42x43 
4 4 


= 



142 x 451.5 

V 12 


T M = T - 1.96sf = 4515 - 1.96 x 39.75 = 373.6 ; 

T m = T - 238if = 4513 - 2.58 x 39.75 = 348.9 ; 

T m cf - 3.29sf = 4513 - 3.29 x 39.75 = 320.7. 

^Bfccauae the compiled 7 of 3153 is lower than even 7^,, the computed 7 is significant below the 0.001 
1 V. there is a significant difference between the mean scores of the groups (P < 0.001). 


Scanned by CamScanner 

















236 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Example 9.5.4. 

The grip strengths (kg) of 21 athletes and 21 nonathletes are given in respectively the 1 st and 3 
columns of Table 9.24. Is there is any significant difference between the grip strengths of the two group*> 

Solution : 

A two-tail composite rank test is undertaken. 

(a) Ranks are assigned in an ascending order to the grip strengths of both the groups taken togethel 
giving average ranks to the tied values (Table 9.24). 


Table 9.24. Composite rank test of grip strength data. 


Athletes 


Nonathlctcs 


Grip strengths 


Ranks 


Grip strengths 


Ranks 



I 


( b ) The ranks of each group are totalled cennmt.u ,. 

Because the samples are of equal size, the smaller nmk sum fsTak'en <TaWe ’ ^ 

T '~ 594 ; T '= m - ”. = ”a = 21 ; - n, + rij * 2 i + 21 = 42 ; 

T = smaller rank sum = 309. 


Scanned by CamScanner 













nonparametric statistics 


237 


(c) B^ 056 l ^ C ^ roU ^ >s arC * ar ^ e (> tw,0_ta il critical T ff values are computed as follows for different 
level*- 


— T, + T} 


_ il—il = 


594 


2 T^- = 45, - s •• W 1 

T os = T ~ 1-96 s f = 451.5 - 1.96 x 39.75 = 373.59 

T 02 = T ~ 2.33 Sf = 451.5 - 2.33 x 39.75 = 35S.SS 

T o\~T - 1.5Zsf = 451.5 - 2.58 x 39.75 = 348.95 

T oo\ = T - 3.29 if = 451.5 - 3.29 x 39.75 = 320.72. 


= 39.75. 


So. 


As the computed T of 309 is lower than even 7 WI . the computed T is significant below' the 0.001 level, 
there is a significant difference between the mean grip strengths of athletes and nonathletes (/* < 0.001). 


9.6 MANN-WHITNEY U TEST 

This is an efficient nonparametric alternative 
to Student’s t test. Its power is only slightly 
lower than that of the t test, but is higher than 
that of the composite rank test or the median 
test. It can be applied to both continuous and 
discrete measurement variables, irrespective of 
normality or non-normality of their distributions 
in the population, and also to very small 
samples. It should, however, be justifiable to 
assume that each score occurs in the sample at 
random and independent of all other scores. It 
is particularly used to test the significance of 
differences between unpaired observations oj 
two independent groups of unequal sizes (n, * 
« 2 )- It is worked out as follows. 

(a) Ranks are assigned in an ascending 
order to all the scores of both the groups taken 
together, giving average ranks to tied scores. 
Like all other rank tests, the U test suffers 
from two inaccuracies due to (i) the assign¬ 
ment of average ranks to tied scores, and (if) 
the ranking of scores ignoring the magnitudes 
of differences between them. 

(f>) The ranks of each group are totalled 
separately to give the respective sums of ranks, 

and R 2 . 



(c) Either of the rank sums may be used in 
computing the Mann-Whitney statistic U. 


V | * n,n 2 + 


(/j * n,nj + 


n,(«| + l) 
2 

/ljpij + l) 
2 



(d) The H 0 contends that there is no 
significant difference between the scores of the 
two groups and they belong to the same 
population. This leads to the proposition that 
R } and R 2 are identical, any observed 
difference between them being due to mere 
chances of random sampling. The probability 
of this H 0 being correct may be worked out in 
two alternative ways. 


(i) For small groups : If any of the groups 
consists of less than 8 cases, the smaller of the 
two statistics, U { or U 2 , is compared with the 
critical U values at chosen a levels and for the 
specific combination of n x and n 2 (Table E of 
Appendix). Critical one-tail and two-tail U 
values are used for respectively one-tail and 
two-tail Mann-Whitney tests. Only if the 
smaller computed U is equal to or lower than 
the critical V at the chosen a, the // 0 is 
rejected and the two means are considered to 
differ significantly. 



Scanned by CamScanner 














23K 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


(ii) For large groups : If both the groups 
oonttill of 8 or more cases, the value U e , as 
expected from (lie H Q , is first computed. 


V. 


tun 


\ n i 


The z score is then computed from any of 
the two U values worked out from the 
respective sums of ranks. Where s u is the SE 

of U, 


+fl 2 + l) • 
*u = { - 12 


Ui-Ue . 

% 


or, z = 


Ui-U, 
Su 


The two z scores would have the same 


absolute value, but one of them would b ear * 
positive sign, and the other the negative * ' 
Ignoring the algebraic sign of the comp u| 
fores the probability P of the H, ^ 
correct is calculated using the uni, ^ 
curve areas (Table A of Append, x). For , H 
tail test, 

p = 2 [0.5000 - (fractional area of unit n 0rinal 
curve from p to the computed ^ 

For a one-tail test, 

p = 0.5000 - (fractional area of unit normal 
curve from p to the computed q 

The difference between the group means i s 
considered significant only if P is equal to or 
lower than the chosen a (P < a). 



Example 9,6.1. 

The winglengths (mm, of two samples of houseflies are g.ven below. 

Sample 1 : 3.9, 4.3, 4.7, 3.7, 4.2. 4.1, 4.8. 5.3, 

Sample 2 : 4.8, 3.6, 4.5, 3.9, 4.6, • - n 02) 

Is there a significant difference between the means of the two samp cs . (a - . 

Solution : 

A two-tail Mann-Whitney U test is done. 

(a) The scores of both samples taken together are assigned ranks in an ascending order, giving averag 
ranks to the tied scores (Table 9.25). 

0) The ranks of each sample are totalled separately to give the respective sums of ranks, /?, and R r 

/?, = 122 ; R 2 = 49. 

(c) The statistic U is computed from both rank sums, /?, and R r 

n, = 11 ; n 2 = 7. 

., «i(*i + 0 n „ . 11x12 01 

U { = n { n 2 +---/?, = 11 x 7 + —--122 = 21. 


(/, = n x n 2 + - ' 2(, ' 2 2 + l) - 2 = 11 x 7 + 1 -^- - 49 = 56. 


Scanned by CamScanner 











NON PARAMETRIC STATISTICS 


239 


Table 9.25. U lest of winglcngth data. 


Sample 1 


Sample 


Winglengths (X,) 


Ranks 


Winglengths ( X 7 ) 


Hanks 


3.9 

4.3 

4.7 

3.7 

4.2 

4.1 

4.8 

5.3 

4.9 

5.2 
5.5 


4.5 
9 

12 

2 

8 

7 

13.5 

17 

15 

16 

18 


4.8 
3.6 
4.5 

3.9 
46 
4.0 
3.8 


13.5 

1 

10 

4.5 

11 

6 

3 



Cnucal . |2. 

Because the computed U. of 21 it tote than the critical n >fc . . 
significant difference between ihc sample meun (p > 0.02) * ^ Ch0ien * lgluf,cancc ,cvcl - flHH* if 

Example 9.6.2. 

in 9 «hletes Ld i! n^inMhTe?^ 0 ' (dcgrecs of arc > relaxed condition 


Athletes 


nonathletes. ~ * v,awj wnainon were found to be as follows 

Nonathletes: 35 ! 26 . 14 , 20 . n' 14 19, % 37 - 

ls a significant difference between the m ' ^ 3 *’ 27 ‘ 24 ' 10 - 

«*» : ““ S<rcng,hS ° f '“**» of two ^ ? (B _ 0 ()5j 

t>yn^ tail V teSt ^° r lar8e grou P s is undertaken. 

- • ter —•- ■—, an ascending , rd .. 

.^-ofoaoh^J^^ lmg0rd ' r '~- 

fc) Th, stalistic „ reSPeC " Ve SUTO of ranks : . 121 . * 

computed from any of the riU)k sunB 

H 

1 2 '*' =9X “^-.2 1 =23. 


Scanned by CamScanner 













240 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Table 9.26. U test of kneejerk data. 


Athletes 


Nonathletes 



Kneejerk strengths (Yj) _ Ranks _Kneejerk strengths ( X 2 ) Ranks 


31 

30 

22 

30 

26 

28 

19 

36 

37 


16.5 

14.5 
8 

14.5 

10.5 
13 

5 

19 

20 


35 

26 

14 

20 

11 

14 

21 

31 

27 

24 

10 


IS 

10.5 

3.5 
6 

2 

3.5 

7 

16.5 
12 

9 

1 




Total 


121 (/?,) 


89 0 ft 2 ) 


(d) The value of U f expected from the // 0 is then compiled 

U n . illi 
* 2 2 

(e) The difference ((/, - {/,) is next iraufoaned into the t too n. 


49.5. 


JW. jp t-). ^Tjjprjr. I3 , 6 

, = V\-V, _ 23-49.5 , 

% ~ 13.16 = ~ 2 ' 01 ; ** e .. * = 2 . 01 . 

worked 8 0u . usfng tteHTmmal curve areajTtTable*A)* ' U °" Ul1 Probabilily P of lhc "o being corree. is 

p : l S: “ z** nomal -- ——«o f 

a = 0.05. 


(P 


<B- ' * ‘ hCre iS 3 SiSnifi ™ beoveen rhe mean kneejerk srrerrgrhs of ,h« groups 

9 

Example 9.6.3. 

1 £ - * of Tab. ,27. is 

Solution : cmor y scores of the two groups ? 

As .each of the groups has more than 8 cases 

ases ’ a two-tail U test for i„ 

* arge S r oups is undertaken 


Scanned by CamScanner 
















































NONPARAMETRIC STATISTICS 


241 



<“> The scores of both groups take,, together are assigned ranks in an ascending order, giving average 
ranks to (he lied scores (Table 9.27). The ranks of each group are then to,t,lie,I separately to give the 
respective rank sums. 


R \ - 143.0 ; R 2 m 47.0 ; n, * 10 ; n 2 ■ 9. 

(b) The statistic U is worked out using any of the rank sums. Thus, using R 2 , 

_ *, . ,0 * 9 + *2±i> . 47 = 88 . 


(c) The value of U e expected from the H 0 as well as the SE (*„) of U is worked out and used in 
computing the z score. 

a. ^ + D _ ^ 10x9(1^0 + 9+1) = 1225 . 

bSk*' r = ^ = «=3.51. 


12.25 


(d) The two-tail P of the H 0 being correct is worked out using the unit normal curve areas (Table A of 
Appendix). 

R = 2 [0.5000 - (Area of unit normal curve from its // to the computed z of 3.51) 

= 2 [0.5000 - 0.4998] = 0.0004. 

As P is found to be lower than even 0.0005, the H Q is rejected and it is inferred that there is a 
significant difference between the mean memory scores of the two groups (P < 0.0005). 


31 i/ ill 

Scanned by CamScanner 























242 


STATIS PICS IN BIOLOGY AND PSYCHOLOGY 


9.7 MEDIAN TEST 

Hus is a nonparametric alternative to the 
tudcnt’s t test for the significance oj 
differences between unpaired observations of 
n man independent groups. It is less 
powerful than the t test and the Mann-Whitney 

test. It can be applied to both continuous 
and discontinuous variables, irrespective of the 
normality or non-normality of their population 
distributions, to groups of identical or unequal 
sizes, and also to small groups or samples. But 
it should be justifiable to assume that each 
score occurs in the sample at random and 
independent of all other scores. 

The H 0 contends that all the sets of scores 
have come from the same population and 
consequently have an identical median. The H t 
proposes in effect that there arc equal numbers 
of scores in each group above and below the 
common median, any observed deviation from 
this distribution being due to mere chances of 
random sampling. The probability P of the 
coirectness of this H Q is found out by a chi 
square test of independence. 

(«) A common median is first computed for 
the scores of all the groups (pages 43-44). 

(b) In each group, positive (+) signs are 
given to the scores higher than the common 
neta while negative (-) signs are assigned to 

ose equal to or lower than the median. For 

. . grou P* the frequencies of scores with 
positive and negative signs are counted 
separately and entered as the respective / 

j o 


values in a contingency table. The latter * 
framed with two columns representinc- t wj 
positive and negative deviations of scores f r( 3 
the median, and as many rows as the nunih J 
of samples or groups — a 2x2-foS 
contingency table results in case of tvv J 
samples only. 

(c) A chi square test of independence (pj-J 
203-204) is then performed, computing 
expected frequencies (£) of the deviations M 
the basis of the H 0 . Where /, and / are , h , 
marginal totals, i.e., the totals ‘ of Ce |B 
frequencies of rows (r) and columns (M 
respectively, and n is the total number of seor M 
of all the groups, the f t of any cell of thl 
contingency tabic is given by : 


/ s Ml 

J ' n 


df - (r - l)(c - 1) ; 


Z 2 = v [In ~ fr T 
ft 

In case of a 2x2-foId contingency table, an I 
ateraauve formula, using the f o values of it] 
cells. A, B, C and D, may be applied. 


Z 2 = 


_ n(AD -BCf 

** + * + CKB + D)(C + D) ■ 


‘?, Th 2 e com P u ted Z 2 is compared with 

evels lflr 1 " 3 f ° r *UcJl 

’ ^ , al x for the chosen a it i<J 

considered significant ’ 11 s | 


Example 9 . 7 . 1 . 

Following scores were obtained by 14 male and 1 fir 

Males ; 22 27 to , e male students in an English 

Females : 22 n 1 / 1°' 32 ' 34 > 39 , 45 ^ g^h-usage test. 


Males ;22 17 1(1 . “ lcmaie studems in an Enplich, 

u , ’ 29 , 30 37 ia on ^gash-usage test 

Females : 22 73 t 34 * 39 , 45 a* ,10 6 

’ 23 > 24 , 26 , 29 31 46, 49 > 49 , 50 <1 c . 

Use the median test to fi h u ’ 3 ~’ 33 , 35, 37 

ft W Cther lhC tCSt SCOre$ dlffCr sig ^cantly in the two 



Scanned by CamScanner 






nonparametric statistics 


243 


p^tail median test is undertaken. 

j A common median is computed for the scores of both the samples. 

n = n, + nj = ^ + 16 = 30. Mdn = —^ * th score = 1 ih or 15.5th score. 

Counting off the scores of both the groups taken together and in an ascending order, 

15th score = 34 ; 16th score = 35 ; Mdn = Hlil _ 34 5 . 

(fc) The scores of each group are assigned positive (+) and negative (-) signs according as they a/e 
ftspectively higher than and lower than (or equal to) the Mdn. 

Males : (0 Scores given negative signs : 22, 27. 29. 30. 32. 34. 

Total number of negative deviations : 6 

(it) Scores given positive sign : 39. 45. 46. 49. 49 . 50. 51. 51. 

Total number of positive donation : 8 . 

Females : ( 1 ) Scores given negative signs : 22. 23. 24. 26. 29. 31. 32. 32, 33. 

Total number of negative deviations : 9. 

(i/) Scores given positive signs : 35. 37. 38. 40. 40. 52. 53. 

Total number of positive deviations : 7. 

(c) A 2 x 2-fold contingency table is framed for computing z 2 end dm numbers of two type 
deviations are entered as f o values in its re spectiv e crib fUbfc 9-28). The marginal totals of cell frequencies 
of each rdto and each column are entered as /, and /. in the table 

(d) Because it is a 2 x 2-fold contingency table, z 2 may be oampried ttratgbtwsy bon the f o in the cell, 
A, B, C and D, of the table and the marginal im»l« f t and f ( . 


Table 9.28. 2 x 2-fold co n ti nge ncy table for median test. 


Groups 

Negative deviations (£) 

Positive deviations (fj 

Total <0 

Males 

6 (B) 

8 (A) 

14 (A + B) 

Females 

9(D) 

7(0 

16 (C + D) 

Total (f c ) 

15 (B + D) 

15(4 + 0 

30 ( n) 


, n(AD-BCf 

30(8x9-6x7> 2 

= 0.536. 


* ~ (A + BXA + CXB + DXC + D) ^ 14x15x15x16 


df= (r- lXc- 

1 ) = (2 - 1X2 - 1) = 1, 


w here r and c are 

respectively the numbers of rows and columns of the table, containing / values. 


(*) Critical z~ scores are quoted from Table C of Appendix. 

Zl i(i) = 6 64 ; *1*1) = 3 84 - 

Because the computed z 1 lower than the critical '/} for even the 0.05 level, it is not significant. So. 
test scores of the two sexes do not differ significantly (P > 0.05). 


Scanned by CamScanner 



















244 


Example 9.7.2. 

The strengths of patelJar reflex 

Group I : 17, 19, 22, 25, 25, 

20 , 21 , 


STATISTICS IN BIOLOGY AND PSYCHOLOGY I 

(in ° radian) were found as follows in three groups of young mrn| 


30. 

21 , 


31, 

23, 

24, 


33, 

33. 

34. 

36. 

37, 

37. (n, » 

»3). 

25. 

29, 

30. 

30. 

35. 

(" 2 « 

12). 

o 

30, 

31, 

31, 

32. 

(n,- 

12). 


Group II : 14, 16, 18, —. 

Group m: 13, 15, 17, 19, 21 23, _ 

significant difference between the scores of the three groups. 


Find if there is any 
Solution : 

(a) A common median is first computed for the scores of all three groups 

ft = U, + /Ij + Bj 

n + 1 


Mdn = 


-th score = 


= 13 + 12 + 12 = 37. 
37 + 1 


-th or 19th score. 


Group I] 


2 “" 2 ^ m 

Counting off the scores of all the groups taken together and in an ascending oidc«. the 19th score is I 

viz., the first 25, "ha's to ‘«•. 

occupy one unit interval extending from '’4 5 rirh nf »K- \ 0niUn 8 ^ *** °t 25* are assumed to 

such *0,= i, counted <* Z ^ ° f ' 11 

Mdn = 24.5 + 0.33 * 24 . 83 . ™ 

respectively higher than ne * tUv « <-) *igM according as the) 

Gr0UP 1 ‘ (,) ^ cores P v «n negative sign • 17 to yy 
Tm number of * 

31 - 33 ' M, 34. 3«, 37( 

». », 

W Scores given negative • * 5 ‘ 

(*> * 9 ’ 21, 23> * 

31 - 32. 

SESft- .he number . 

c* ■"* celj s as Ih ' ^ " *> - Z 4 “ ** ^ totell of “ ' ype 

* ceV rer Z?**. 2) are ^ be coIS*?" 

8 ^ «evLr/l° f > **«. 


Group IH 


. -- *»W^Ut 

ro * c 7^ and the exMc , C ° mPU ' ed ” 

ngs - F °r exampl, 


dev; 

4=13 


^ 20 • 


cell belt 

^^ = *3X20 

3T~ = 7.0. 


Scanned by CamScanner 







NONPARAMETRIC STATISTICS 


245 


Sin* 


, , the cell giving negative deviations for Group III 

jlafly* for 


S r = l 2 • fc = 17 *. 


/ = = 5.5. 

Je n 37 


lues for the remaining cells are obtained by subtracting the already computed f f values from 
values (Table 9.29). 

e tlief The values of f 0 and f e of each group are used in computing X 1 - 

,, f )2 13-6.0) 2 (10-7.0) 2 (7-S.5) 2 (5 + = 4.30. 

= + + 6.5 5.5 6.5 

Table 9.29. 3 x 2-fold contingency table for x l test of patellar reflex data. 

Negative deviations Positive deviations 



*.0>U) -* 

As the computed x 1 found to be lower than the critical ««• for even 0.05 
not significant rims, ilu-ie ^ 'enact between the score 1 *0.03), 


GLOSSARY 

chi square : nonparametric statistic given by the sum of the ratios of squared deviations of the observed 
frequencies of a distribution from the frequencies, expected from a proposed distribution, and the 
respective expected frequencies. 

chi square test ! nonparametric analysis of frequencies to find whether or not an observed frequency 
distribution differs significantly from a proposed frequency distribution. 

chi square test for goodness of fit : nonparametric test for finding whether or not an observed frequency 
distribution fits significantly with an expected frequency distribution based on a proposed distribution 
like the normal, binomial or Mendelian distribution. 

chi square test of independence : nonparametric test to find whether or not there is a significant 
association between two variables. 


composite rank test : nonparametric test to find the significance of difference between means of two equal- 
size independent groups, using the sums of the ranks (of the respective groups) given in a composite 
manner to the scores of both groups taken together. 

G test : log likelihood ratio test for nonparametric analysis of frequencies to find if an observed distribution 



Scanned by CamScanner 

















246 


STATISTICS IN HIOLOGY AND PSYCHOLOGY 


G 


test lor goodness of fit : log likelihood ratio test for finding whether or not nn observed distrib, 
significantly with an expected distribution, bused on some proposed distribution like the 
binomial or Mendelian distribution. 


"ton r,( 
n °rrn il | 


G test of independence : log likelihood ratio test for finding whether or not there is 
association between two variables. 


a 


• s '8nifj 


c !»nt 


Mann-Whitney U test : nonparametric test to find if there is n significant difference between means «,f j 
unequal independent groups, using the sums of the ranks (of the respective groups) gi VCn * 
composite manner to the scores of both groups taken together. 

median test : nonparametric test for finding the significance-of difference between two or more indeDe 

groups, using a common median of those groups. 1 ° n 

nonparametric statistic : a statistic worked out without using any precomputed statistic as an estimate 
parameter. 1 

signed rank test : nonparametric test for finding the significance of difference between means in a smsi 
group/matched-pair group experiment, worked out by giving ranks bearing the respective algebral 
signs to the differences between the paired scores of each individual or case. 


dates' correction : correction to be applied on each difference between the observed and the 
frequencies in a chi square test, if any expected frequency is less than 5 and the chi square 
has the df of 1 only. 


expecte 
moreove 






Scanned by CamScanner 












10. PSYCHOLOGICAL TEST CONSTRUCTION 

p s ychol°g* ca ^ tests are undertaken mainly to Dependent varuiblca 
study indi v ‘d ua ^ differences in behavioral or This j g t ^ c behavioral rchporue to be 

oS ychol°g‘ ca ^ ana • 1 ^! lf SUC aS ! 1 ^ ence * measured or studied in an experiment after 
Lmory> a P utu e ’ a 1ty ’ perso ". *y» attlt ude, exposurc 0 f thc subjects to different leveU ol 
aspiration, anxiety an emo ion ty. the independent variable(ft), for asxchriiig thc 

psychological measurements are expressed effects of the latter on that behavioral respond 
• various ways such as the speed of response, Dependent variable can be measured by thc 
L number of correct responses, the number of number of correct responses to a stimulus, the 
trials for achieving a given performance level, time taken to react to the given stimulus, and 
an d the average number of items remembered the accuracy of performance. Sometimes, thc 
after a brief exposure. Units equidistant on a dependent variable is measured by objective 
psychological scale are assumed to represent tests using rating scales. Thus, the . 
identical differences in the given psychological 


variable. 

10.1 PSYCHOLOGICAL VARIABLES 

Psychological variables include such 
variables, many of which cannot be observed 
directly from outside, can only be inferred from 
expressions, behaviours and verbal repons of 
individuals, and consequently depend for proper 


variable is often the measured behavioral 
response to thc given independent variahle(s), 
which constitutes mostly quantitative and 
continuous data, but sometimes qualitative data 
as in personality tests of projective type. 

Independent variables 

Independent variables are deliberately 
chosen by investigators and used in 
experiments for studying their effects on 


evaluation on the cooperation ot the subjects specific dependent variables. In psychological 


involved. They include intelligence, memory, 
aptitude, ability, attitude, aspiration, anxiety, 
emotions, personality and motivation. Many of 
them are hypothetical and abstract in nature, 
cannot be precisely measured on quantitative 
scales, and can only be assessed qualitatively. 


experiments, they belong to two types. 

(a) Organismic variables include (i) 
physical characteristics of the subjects such as 
their sex, age, eye color, body build, height and 
weight, any of which may be chosen as the 
independent variable by the investigator, and 


Even when quantitatively measurable, some of ..... , , . . , . , , . 

. ,7 , \ , ... 00 psychological characteristics of the subjects 

tee variables have an interval scale with an such ^ ^ imemgence personallly f J K>n 


arbitrary zero point instead of a real zero ; 


drive, emotionality, neuroticism, extroversion, 


however, some psychological variables, such as • ’ - , , ’ * 1 

, v j ° , . , aspiration level, motivation, anxiety, tension and 

ihe ratios of psychophysical stimuli and calory frustration In corre l a tional research, the 
expenditures in job activities in industrial independent variables used are mainly such 
psychology, are measured quanutauvely m ratio physical of wchologica i characteristics of the 
ca es with real zero points (page 3). subjects. Such independent variables can rarely 

Psychological experiments involve variables be manipulated directly (or fixed ) by the 
SUc h as dependent, independent, extraneous, investigator ; for example, the latter cannot 
relevant and intervening variables. directly manipulate the intelligence, personality. 




247 


Scanned by CamScanner 



k 


248 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


ment and the size ot printed materiflfl 
age or sex of a subject. So. such inde P endenl “ P d "„ a me mory experiment, none of whk 

variables am liable to random changes and may u Qr dellber a.ely used by , ht 

be considered as classification variables <pa„e for the purposes ot the expen,non L 

5). Nevertheless, such a variable can be in 8 as (he dependent variable being 
manipulated indirectly through a selection ~ or ma y not be alfected by such 

procedure like the choice of subjects with varia bles, they belong to t* 

specific required levels of intelligence. c j aS ses. 

(b) Stimulus variables consist of such 

environmental events including both physical (fl) Relevm t variables : These a. e such 

and social variables, which stimulate specific extraneous variables which, though not 

receptors of the subjects to affect the dependent deljberate i y used or intended to be used by the 

variable, viz., a specific behaviour of the investigator t0 study their effects on the 

subjects. The investigator can directly depenc j enl variable in that particular 

manipulate (or “fix’ ) the stimulus \ariable experiment, occur spontaneously, can inlluence 

chosen as the independent variable, such as and affect the dependent variable, and may 

changes in the intensity of the stimulating light. conse q Ue ntly defeat the purpose of the 

in the number of syllables offered in memory eX p er j m ent to study the effect of only the 

experiments, in the color of the light stimulus j nde p endent variable on the dependent one. The 

in an experiment on after-images, in the pitch j nvcsl jg ator must remain vigilant about the$e 

of a sound stimulus or in the decibels of noise re | cvan , variables and must control them as far 

used as the independent variable in experiments ^ ^ SQ fls |Q mjnimize lhcir effects on 

on attention, or in the instructions for reaction ^ d ^ varjab|c< They can be 80U h , , 0 

time experiments. Such stimulus variables. r ... ... . .. 

. . , . * . . . e .. be controlled by methods like constancy of 

being under the manipulative control of the . ... . . . J 

investigator, are not liable to random changes ex P er " n * m cond ! ,,ons - balancing and 

and may be considered as -fixed" .rea.m'en! counterbalancing, randomization, matching, and 

changes in design in multiple-group 

experiments. Relevant variables are further 

classified into subject relevant, situational 

relevant and sequence relevant variables. (A 

Subject relevant variables are organistnic 

variables owing to both physical characteristics I 

be used as the dependent and independent characZrisrics'llo w 

variables in the experiment being undertaken „. ra „ 8 ” m igence ' neuroticism. 

Extraneous variables include such physical and The h Y and molivational aspects) of 

social environmental factors as well Is such ‘ , J f‘ S Under . stud y- It is very hard 10 

physical and psychological characteristics of the thTaV ^ Se organismic variables because of 
‘fleets, as are other than the depend m l l in assessi "8 them from outside 

-^pendent variables. They are addU.onal T ") anipula « in g them directly fr.l 

nables happening to occur in any experiment '" an ° n<d jdevant variables are those which | 

b U midi,rm-,be m lmam!y gh! th dr ° U8 d ht a " d --roll If Si,Ua,i ° n a " d ' hC 

..- -. izcz - 

HHl y Can most| y be controlled In thl 


variables (page 5). 

Extraneous variables 

These are numerous variables which occur 
arise in the physical or social environment, 
the subjects under study, or in the 
experimental procedure, but are not intended to 


or 

in 


Scanned by CamScanner 









ry. 


wmmmmmrn 


PSYCHOLOGICAL TEST CONSTRUCTION 


249 


outlined below. 

(a) Area to be assessed : First, the specitic 
area of ability to be assessed should be 
identified. 

(b) Selection of test items : A number ol 


pve ^ iv ’’ * 

ondition of the subjects may be considered 
jjrelevant variables in an experiment to study 
^ effect of practice on memory. 

Intervening variables 


tof( (Hi) Sequence relevant variables 
inVeS from die sequence of applications of the 
»f* sC variable and include fatigue, 

inlllP l monotony, etc. (Sec page 6 also.) 

'A » 

/f A irrelevant variables : These are such „ . . . _ 

which do not perceivably affect the (b) Selec,ln " °f »«« ‘“ ms [ A "™ e 01 

f !dem variable. For example, hair color, ,csl ? hould bc “ ch ° sen fot e “ Ch “ 

d,| * c0 | or skin complexion or economic ’? enable the pro P cr and effic,ent ex P lor! "' on ° f 

' lition of the subjects may be considered the ,n , ,cnded area of ab,l,ty ' A d,cbotomo " sly 

scored test item is scored as +1 or 0 according 

to right or wrong answers respectively, and the 

total of item scores gives the test score. 

. .. ., . (c) Item analysis : As items are selected 

It is 1 1CU 1 ° 1 en 1 y an control some depending on their difficult values and 

such psyc o °g* ca organismic varia es as act discriminatory powers, item analysis should be 

side by side with the independent variable, . , 

Sl J , . . . . „ ’ undertaken to determine these two properties of 

Without the investigator being aware of that, each test item ( , l0 4) 

and affect the dependent variable. In an 

instrumental conditioning experiment, drive W) Arrangement of test items : The test 
may be such a variable, and it may link an * tems selected by item analysis should be 
independent variable like food deprivation with serially arranged in a suitable order according 
a dependent variable such as the behavioral to tyP e °f test (§ 10.3). 

modification. In industrial work, lack of (e) Pliability : Reliability of a test is the 

mouvanon may be such a variable and *e real consistency of results on its repeated 
cause of decrement of the output rather than „ r , repeateu 

physiological fatigue considered as the ^ abaas ° n ^ same sanl P le und « ^nucal 

independent variable affecting the dependent ^ mi tet t!” r , P u 8 - * ^ “ aC,Ual use ' 

variable of industrial production ? In an rellab,bty sh ° uld be esumated by applying it 
experiment studying the effect of intelligence ™ a properly drawn representative sample (§ 
level on achievement, anxiety may be such a 
variable to influence achievement (dependent 
variable) side by side with intelligence 
m ®P endent variable), and consequently needs 
,° be neutralized first. Such unobserved 


(/) Validity : Validity is the capacity of 
a test to measure a specific variable in 
exclusion of others. It should next be 

--- nisi, ouen unobserved estimat ed from the scores of the test in a 

ypothetica 1 variables, assumed to be associated representative sample (§ 10.6). Factor analysis 
ln ependent variables like intelligence, s ^ ou ld be undertaken for the construct validity 
based 011 ' mot ‘ vation ’ as P‘ration and habit, are a lesl C§ 10.8). 

,nt ^evening variables nSUUClS a ° d are called . (s) Standardization : Methods of administra¬ 


tion and scoring should be precisely laid down 
for the test and standardized (§ 10.9). 

(h) Establishment of norms : A norm is the 

»b£ ch E t er t de f s r n,y wi,h the ,es,s ° f S iT.°« 

• Essentials of test construction are of transformed scores. For interpreting the lest 

^2 


10,2 PRINCIPLES of test 
CONSTRUCTION 


Scanned by CamScanner 





250 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


scores, the scores of a representative group 
should be statistically treated to establish the 
norms (§ 10.9). 


10J POWER AND SPEED TESTS 

Tests belong to two types, power tests and 
speed tests, according as they differentiate 
between individuals in terms of difficulty levels 
of items and the speed of performance, 
respectively. 

In power tests , the test items are usually 
grouped according to the types of contents and 
are arranged serially in an increasing order of 
difficulty. Enough time is given to enable at 
least 75% of the subjects being tested to 
attempt all the items. Still, some may fail to 
answer all the test items because of the 
difficulty level. Achievement tests are pure 
power tests. 


In speed tests, all the test items are of 
uniform difficulty level, and the subject being 
tested has to answer the test items within a 
stipulated time which is so short that none can 
answer all the items in time. The level of 
ability of the subject is determined by the 
latter s speed of performance. 


In most tests, however, both speed am 
power are mixed in varying degrees. So, ther 
18 no n gid line of demarcation. Estimation c 
reliability depends on the proportions of powe 
and speed in the test. 


10.4 ITEM ANALYSIS 

Most tests are composed of test items 
chapter is mostly concerned with test iteir 
ab.l.ty tests. Item analysis is essential 
selecting suitable items which confirm 
enhance the reliability and validity of the 
and contribute towards the goal of ,he 
roperties like score distribution mean 
variance of the total test scor« a ’ 

properties of the item • so the °' 

analysis. ' S ° the need for 


Item analysis determines the difficulty 
and the discriminatory power of each test i^H 
— the discriminatory power of the test is, fl 
broader sense, an index of validity. For s u c l 1 
determination, 

(a) the psychological processes, involvccfl 
the attribute under study, are first analyzed • 

(b) a list of items is next prepared wi^ 
more items than what is required for the gi| cn 
test ; 

(c) these items are then administered td« 
representative sample on a trial basis ; 

(d) the selection of test items is finally dJH 
quantitatively by determining for each item : ■ 
the difficulty value given by the percentage or 
proportion of individuals answering a test it J| 
correctly, and (ii) the discriminatory power 
which is the ability of a test item to 
discriminate between individuals with respect to 
the given variable ; 

(e) it should be ensured that the test iterfl 
should contribute equally in predicting th c 
psychological variable. 

Item analysis is done for the final selection 
of items for a test closer to power test ; but it 
is useless for items meant for a pure speed tel 

for the latter, the items are instead rotated in 

erent orders so that every item has an equl 
chance of going t0 the last part of the test a| 
the limited time allotted is balanced. 

denenH* ab ^* ty tests » items are selectel 
mouslv mg ° n , diff,CUUy Value of dichowS 
items should betler'be s F .° r n °"’ ability tests 

the itp m be se l ecte d by considerin 

COrrelati °" - we., as th 

1. Difficulty value 

proDortin S ^ et ® rn ?^ ne d for a test item by th 
the iadivi/ answers to that item b 

o.alnul r ° f 3 S3m P' e - Where „ is ,h 
number of individuals in die sample, P 

Scanned by CamScanne 


/> 

¥l 


4 * 

l> 


if 












PSYCHOLOGICAL TEST CONSTRUCTION 


251 


umber of individuals passing or giving the 
^ "answer to the given test item, and p and q 
^e proportions of individuals of the sample 
^ing respectively right and wrong answers to 

the 



Some important properties of the difficulty 
value are discussed below. 

(a) The difficulty level of any item is 
actually inversely related to its p value. The 
latter ip) represents the average item score as 
well as the mean index of difficulty for the 
individuals. For individuals of the same 
population, items with an identical p value are 
considered as equally difficult. 

(b) Because a dichotomous item is scored 
as 1 or 0, p is also the mean score of all 
candidates in that item and is given by dividing 
the total scores of both successful and 
unsuccessful candidates in that item by the 
sample size ;i. The greater is this mean score p 
of the item, the lower its difficulty value and 
discriminatory power. Thus, p is a direct 
measure of the easiness of an item and an 
indirect measure of its difficulty. If, 60 out of 
150 persons answer correctly the item 2 of a 
test of 7 items, the p value of this item is 
given by 60/150 or 0.40 which is the 
proportion of “pass” in this item. This item 
would be considered less difficult than another 
with the p value of 0.30, but more difficult 
than one with the p value of 0.70. 

(c) The variance (s 2 ) and the standard 
deviation (s) of the scores in an item depend 
on its p value. 

s 2 = pq ; s = • 

Thus, the p value affects the nature and 
shape of the score distribution of the item. 

It is desirable to choose an item with p 
v alue of 0.50 so that its scores have a variance 


of 0.25. 

p = 0.50 ; 

q = I _ p = i _ 0.50 * 0.50 ; 

s 1 = pq = 0.50 x 0.50 = 0.25. 

Such p value provides information about 
each individual passed or failed. In contrast, an 
item with p value of l or 0 gives little 
information about individual differences. 
However, it is difficult to discriminate 
individuals with different scores if all the items 
possess an identical p value of 0.50, because 
the items of a test, expected to reveal the same 
attribute, would be correlated with each other. 

It is, therefore, preferable to select test items 
having an average p value of 0.50, but with p 
values of individual items moderately spread 
out around that average. 

(</) The p value is related indirectly to the 
reliability of a test, because a high p value is 
associated with a high item-to-item correlation. 

(r) For dichotomously scored multiple 
choice items, the p value is affected by both 
item difficulty and guessing or chance factor. 
The chance factor would enable some 
individuals to hit upon the right answer to a 
test item by guesswork. Guessing would thus 
cause the obtained proportion p of passing an 
item to exceed the real proportion and the item 
will consequently appear to be relatively easy. 
The obtained p of right answers to the item 
should, therefore, be corrected for the guess 
factor by Guilford’s formula to give the correct 
proportion p c , using the number k of alternative 
choices for answer to the item : 


(/) Rise in the number of alternative 
answers to a multiple-choice test item would 
increase the difficulty level of the item owing 
to the decline in the chance factor. So, to avoid 
the lowering of reliability owing to the guess 
factor, the desired uncorrected p should be set 


Scanned by CamScanner 





252 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


lower for an item affected by guess factor than 
that for an item free from the guess factor, but 
progressively higher with the rise in the 
number of alternative answers. Thus, where p 
may be set at about 0.69 for a 5-altemative 
item, it should be set at about 0.67 for a 4- 
altentative one. 

(g) Difficulty levels, determined by p values, 
have no linear relationship with ability. The p 
values simply show the relative values of items 
and do not express the absolute difference 
between two items, because units of p are not 
equal-interval ones. So, equal differences in the 
proportions (p) do not express an identical 
difference in difficulty. For this, the p values 
should be transformed into z scores using the 
unit normal curve area. This produces a linear 
scale with increasing order of difficulty. It also 
provides a correction for chance factor in 
multiple-choice items. 

If, for example, 25%, 32% and 40% of a 

group of students were successful in answering 

test items A, B and C respectively, it indicates 

that A is more difficult than B while C is the 

easiest of the three. Assuming a normal 

distribution of the ability measured by the test, 

the relative difficulties of A, B and C can be 

converted to o values and shown in a normal 

distribution with equal-interval units. Thus. 

25% or 0.2500 of the unit normal curve area in 

its right or high-value tail represents the 

proportion (p) of students successful in A. 

Fractional area of the unit normal curve from 

its centre to the lower limit of this area for 

success in A is given by the difference between 

half the normal curve area and the area for 

success in the item : 0.5000 - 0 ?500 • 

refemng this area to the unit normal* curve 

ttble. i is seen that this area cotresponds to the 

" £ + ° (Table 6.2 and Fig 6 , 

(c) and Fig. 10.1) Thus n 67 s . 8 * 0 2 

bn.it of the area for7uccess m A '° Wer 

* value of difficulty for tWTahl^Tn ““ 

Similarly, 32% or 0.3200 of the normal 'curve 


Table 10.1. c values of difficulty of test :« 


Test 

items 

% 

success 

cr value 
of difficulty 

0‘differJ 

A 

25 

0.675a 

A - bT3 

B 

32 

0.468£7 


C 

40 

0.253a 

B - C = Oj 


area in its right tail gives the proportion (p) of 
students successful in B. The fractional area of 
the unit normal curve from its centre to tha 
lower limit of this area for successlin B 
amounts to : 0.5000 - 0.3200 = 0.1800, which 
corresponds to the interval fi + 0.468 cr (Table 
A of Appendix). Thus. 0.468cr is the lower 
limit of success in B and gives the o 
difficulty for B (Table 10.1). Again. 40% or 
0.4000 of the unit normal curve area in its 
right tail represents the proportion {p ) 0 f 
students successful in item C. The fractional 
area of the unit normal curve from its centre to 
the lower limit of this area for success* in C 
WW W «o : 0.5000 - 0.4000 = 0.1000. 
corresponds to the interval n + 0.253 o (Table 
A). Thus. 0.253a is the lower limit of success 
in C and gives the o value of difficulty for C 
(Table 10. \). * 

It is seen from the above example that the 
higher the p value of an item, the lower is the 
z score for the lower limit of the correspc 



Scanned by CamScanner 











HSYCIIOLOCJICAi, I l .s'l 


(‘ONfllKUCIION 


253 


fractional nren in the right tail of the unit 
n0 rmnl curve. In the example cited above, 
item A : p « 0.25 ; z • 0.675a ; 

item B : p = 0.32 ; z ■ 0.468a ; 

item C : p~ 0.40 ; z - 0.253 a. 

Thus, where p = 0.50. z = 0.00a, i.c„ the // 

0 f the unit normal curve. It, therefore, follows 
that the proportions (p) of successful 
candidates, having negative and positive z 
scores, amount respectively to >0.50 and 
<0.50. 

(/i) It is possible to calculate the proportion 
of students expected to be successful in test 
item D, if D is more difficult than A by twice 
the relative difference in difficulty between B 
and C in terms of their a difference. 

o value of difficulty for D 
fc {a value for A) + 2x(adifference of B Q 
= 0.675a + 2 x 0.215a 
= 1.105a. 

So, +1.105 a marks the lower limit for the 
unit normal curve area corresponding to the 
proportion p of students successful in item D. 
Fractional area of the unit normal curve from 
its centre to +1.105 a amounts to 0.3654 (Table 
A of Appendix). So, the remaining area in the 
right tail beyond 1.105 a amounts to : 0.5000 - 
0.3654 = 0.1346. This area gives the 
probability of cases passing in the item D. 
Thus, 13.5% of the candidates are expected to 
pass in item D. 

2. Discriminatory value 

The discriminatory value of an item 
indicates its ability to discriminate between the 
subjects of a test, belonging to upper and lower 
categories with respect to their answers to that 
llem - The discriminator)' value of a test may be 
estimated by (/) item validity measured by 
item-criterion correlation, (it) item-total 


tllC tCKl. 

(I) Item validity nr hrm criterion 
correlation ; 

'Hun ix determined by correlating the score* 
of a given text item with a criterion. A high 
correlation with the criterion indicate* a high 
discriminatory value of that item. The criterion 
to be chosen depend* on the type of validity 
envisaged for the text — content, construct or 
predictive validity. When emphasis is laid on 
construct validity, the criterion used is the total 
score of the same text or of any other 
equivalent test. Measures of job performance, 
teacher’s rating or grade performance maybe 
used as criterion for content validity. For tests 
of interest and personality, validation is done 
by correlating die items wilh a suitable external 
criterion revealing the given attribute. For 
achievement tests, the criterion used may 
consist of grade performances or teacher’s 
ratings. Academic achievement is often used as 
the criterion for intelligence test. When 
emphasis is laid on predictive validity, such an 
external criterion is used as consists of the 
measures of everyday success in the variable 
under investigation. 

(2) Internal consistency or item-total 
correlation : 

The test items may sometimes be correlated 
wilh the total scores of that test for internal 
consistency ( item-total correlation). The 
homogeneity of a test depends upon the highest 
average item-total correlation. 

According to the type of validity required 
and the nature of variable, point biserial r, 
biserial r, tetrachoric r or phi coefficient may 
be computed to correlate the scores in the item 
with either the total test scores or the criterion 
revealing the desired attribute. 


correlation between the scores in that item and (a) Point biserial r or r .is used in 
total scores of the test, and (Hi) index oj correlating the scores of a continuous variable 
^crimination, according to the purpose of like the total test scores with a genuinely 


Scanned by CamScanner 






254 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


dichotomous variable such as the right/wrong, 
yes/no and 1/0 answers to a test item (§ 8.7). 
Thus, it may be used for item-total correlation 
for item analysis ( Example 8.7.3). r pbi is also 
preferred for correlating a dichotomously 
answered test item with an external criterion 
revealing the attribute under investigation ( item- 
criterion correlation), if the predictive power of 
the item for -the given attribute is to be 
explored. 

(b) Biserial r or r b is used in correlating the 
scores of a continuous variable like the total 
test scores with the answers to an apparently 
dichotomous test item. In such cases, r b is 
applied for item-total correlation (§ 8.8 and 

, Example 8.8.1). Such item-total correlation 
measured by r b is independent of the difficulty 
value of the test item. r b is also computed 
between a dichotomized test item and the 
continuous scores of an external criterion in 
order to assess whether the item measures the 
same attribute as represented by the criterion 
(item-criterion correlation). 

(c) Phi coefficient (<p) is used for item-to- 
item correlation between two dichotomous test 
items, each scored as yes/no, right/wrong or 1/ 
0 (§ 8.9). It is also used in correlating a 
dichotomously scored test item and a genuinely 
dichotomous external criterion such as success/ 
failure in a public examination (item-criterion 
correlation) so as to explore the power of the 
test item to predict one of the two categories of 
the dichotomous attribute represented by the 
criterion. 

Phi coefficient may be computed directly 
from the observed frequencies (f o ) of cases in 
different combinations of the classes of two 
dichotomous variables (either two dichotomous 
items or an item and a dichotomous criterion) 
arranging these classes along the rows and 
columns of a 2 x 2-fold contingency table 
(Example 8.9.2). However, it is often computed 
also from the proportions of cases in different 
combinations of the classes of two dichotomous 


variables being correlated — for example, J 
proportions of cases passing or failing i n 0( S 
or both test items being correlated (ExampL I 
10.4.1). In the latter case, where a, b, c and 1 
are the proportions of cases in the four cells ,,f 
the 2 x 2-fold contingency table, /;, and />, J 
the proportions of passes or right answers ij 
items 1 and 2 respectively, and q y and q, a J 
respectively the proportions of failures o| 
wrong answers in those items (Table 10.3), 

_ ad-bc _ 

^ ^(a + 6)(a + c)(b + d)(c + </) 

ad-bc 


or, * = -2ZML 


7hPi 


Pi<h<h 

m item ciitei ion correlatioi 
is worked out with a criterion dichotomized at 
it m dian into upper and lower groups, cacl 
with a proportion of 0.50 of the sample size 
(P - Q = 0.50), and p u and p y arc the 
proportions of pass in the respective groups, 

2 Jpg 


(d) Tetrachoric r or r f is computed for item 
criterion correlation to measure the extent to 
which a dichotomized test item and a 
dichotomized criterion estimate the same 
attribute, when the test item and/or the attribute 
may be considered artificially dichotomized 
(§ 8.10). It is also used in correlating the items 
of two questionnaires scored on scales other 
than dichotomous responses. It may also be 
applied to personality tests with dichotomized 
test items. However, because of its higher SE 
and lower reliability than product-moment r 
and a complex procedure for its significance 
test, r f is seldom used in item analysis. 

(3) Index of determination (D) : 

This may also be used in determining the 
discriminatory value of an item. D expresses J 


Scanned by CamScanner 











PSYCHOLOGICAL TEST CONSTRUCTION 


255 


difference between the proportions (p u - p t ) 
^candidates of the upper and lower groups of 
| °u dichotomized criterion passing in the given 
t item. D ranges from -1.00 to +1.00 in 
value h * s mainly used in item analysis of test 
be conducted with small group such as a 
test administered to a class by a teacher. To 
find the difference (U - L) between percentages 
of the upper and lower criterion groups passing 
I in the test item, usually 27% of individuals are 
I selected from both these groups in case of 


large normally distributed samples. I he (U _ I-1 
difference between the proportions ot these two 
selected batches passing in a test item gives the 
index of discrimination of that item, indicating 
its discriminatory power. The index of 
discrimination should preferably be used loi 
test items with the difficulty value ot 0.50. 

(4) Factor analysis : 

Some psychologists prefer factor analysis 
for assessing the discriminatory value (§ 10.H). 


I Example 10.4.1. 

In a psychological test, 35 persons gave righi ani wct i id each of two tc>t items. 25 persona answered^ 
I both wrongly, 20 persons gave answers to item 2 and wrong answers to item 1, and 10 persons. 
I answered item 2 wrongly but item I correctly. Is there any isem-to-item correlation between the two items? 

Solution : 

(n) A 2x2-fold contingency table is framed (Tabic 10.2) wnh Its top and bottom rows f"i i.(»<** lively 
right and wrong answers of one item, and its right and left ootamm for respectively those cA tbi Otltfi Item 
I The frequencies of persons, giving s; combinations of right and wrong answers to I 
entered in the respective cells for such combinations. For ntample, the frequency of persons an iweting item 
I rightly and item 2 wrongly, is entered in the top left cell or cell B. The m a r g in totals*/ Uld / IN 
i worked out for the rows and columns respectively. 

Table 10.2. Fourfold contingency table for right and wrong answers to two test items. 




A 


Item 1 

Item 2 


Total 

wrong 

right 

fr 

right 

10(B) 

35(A) 

45 (A + B) 

wrong 

25(D) 

20(0 

45 (C + D) 

Total (f c ) 

35 (B + D) 

55 (A + C) 

90 (n) 


(b) All the frequencies are converted to the respective proportions by dividing them with the sample size 
to = 90) and arranged in Table 10.3. 

(c) Phi coefficient is computed using the cell proportions (a, b, c and d) and the marginal proportions 

j bi'P v q x and q 2 ) of Table 10.3. 


a> = 


ad-be 


0.389x0.278-0.111x0.222 


Vo.500 x 0.611 x 0.500 x 0.389 


= + 0.34. 


[Alternatively. 


0.389 - 0.500 x 0.611 
Vo.500 x 0.611 x 0.500 x 0.389 


0 = . - 


= + 0.34.] 


Scanned by CamScanner 





















256 


STATISTICS IN BIOLOGY AND PSYCHOLOGY I 

W He computed « is convened to Z ! ** “ m P ared ^ CnU “' ** SCOrcS ' I 

z 2 = n^ 2 = 90 x (0.34) 2 = 10.40 ; df = 1. 

Critical * 2 : £ooi(i) = 10 83 ; **«i> = 6 64 ’ (Table ° I 

Because the computed * 2 is higher than the critical X 2 for 0.01 level, it is significant and there J 
significant item-to-item correlation (P < 0.01). 


A 


* J 


As 

f *r 




* l 




Item 1 


Item 2 

Total 

— 

' 

wrong 

right 


right 

wrong 

0.111 (h) 
0.278 (d) 

0.389 (a) 

0.222 (c) 

0.500 (p,) 
0.500 (*,) 


Total 

0.389 ( q 2 .) 

0.611 (p 2 ) 

1.000 

— 


* 

/. 


Example 10.4.2. 


3 items were chosen out of 50 items of a psychological test administered to 200 students. The unner mri 
ower 27% of the students, sooting the Ughot and the lowest seam respectively m the test, wen chosen® 
form respectively the U and L groups, each consisting of 54 Studei. m.mlv. ot students Of each crolil 

52 * each ° f,ik cto “ mm* d»dud... . ..: 

Solution : 


>ed 


L = H = *»• 


P = 21 ; n = 54 ; 

- 54 ~ 

The index of discrimination f m 

proportions of pass in the upper and the loner” T item “ ‘ he differe "« between the 

of pass in the two groups amount to : P ( “ d L) ' For «“>Ple, for item 1, the proportions 

u = 0.74 ; L = n If. • n IT 

- D = U - L = 0.74 - 0.26 = 0.48 


Jable 10,4, Comp utation of the index of di«. • • 

~^rrr~ -of test items. 


Thus, it may be inferred from the re,n« , ~ 

and item 2 has no power of discrimination^ 6 ° SC ° reS lhat 1 




\/ 

ftcli 


L>; 

l»ise 

\&s 

pj 

list. 

It'jtr 

ill 

Inin 

l"0 

ft 



. 

1 


1 has the highest discriminatory po*d 


Scanned by CamScanner 


















257 


W PSYCHOLOGICAL TEST CONSTRUCTION 

t mav r result from 

A( . A TFST repetition of the same te ^ * .... or random 

„ ° f resu|u 

^,/jKW '* aled on the same individual cled average of the SC ° inlber 0 f 

in TLTus" under identical 

= ;, assume^ 

,l»* ‘‘foiher equivalent forms of the test unchanged bccause i, is suppose^ 
those o |0 thc same gronp. Rcl'ab.l. y ^ ^ ra „ dom error (X_ random 

.dm 11 " L stability of scores which tend to ^ 0 f a „ individual diffe shou id 

Ct* wcauSC ° f CrT ° rS ° f m “ SUremem from X.'due to X, s for «*•“**,* iuways 

w 11 „f measurement correspond closely to X . „ c<J in terms 

!S 0 ^ ntc-remen, may be either ‘^und X ,U ^ 

‘ , | ity I s affected mostly by random - ^ expected to be normally 

*" R lie, ' V, v ... 

errors- ki ^ fantnr(l KUC h as ; errors of ' 

or. X, •= X, - X„. 

Reliability depends on the closeness 
between X ( and X. with only a minimal X e . 

(b) Systematic errors : 

.; , ime cap between administration* oi Systematic or constant errors affect mamly 

jl,,;..of. test as well as difference. a* «lidi.y of a teat, not its reUablUty. They 

JJ | he i r contents and scoring may result in resu |, from defects in lest construction, lim 
random errors. In multiple-choice tests. un, c allotment to an ability test, some person 
guessing definitely serves as a source of mo des of response and environmenta 

random errors due to variations in item-to-item variations. Systematic errors act 

performance, and lowers the reliability. unidirectionally, causing either overcstimation 

. or underestimation of the scores in all cases 

Random errors of measurement should t>e 

minimized by proper item sampling, large ssstem 

sample- size, clear instructions for proper test Reliability coefficient 

administration, and objective scoring and Reliability may be statistically defined as 

interpretation. the proportion of true variance (£) in the total 

Random errors may be positive as well as variance of the scores of a test. In that 

negative — not systematically unidirectional for context, reliability coefficient (r rt ) is a measure 

111 foe scores. The mean random error will be of correlation between s* m and sj of the test 

* R ero if a large number of trials is undertaken, scores. The total variance (sf) of scores of a 

® dorn errors are assumed to be uncorrelated test estimates the deviations of individual scores 

ea ch other and with the true scores. (X f ) from their mean (X,). It is the 


Random enure . 

. . ..•Uiv'l h" ' 

' J ll interpretation : fluctuating mood. 
‘"’""La lnuo" or interest of the subject ; 
"« and 'other distractions. Sampling of test 

Jon may be • . •••'""* of random errors 

ll may Iti-m ...' 

lest Thc time gap between .idministrations ot 

lVl ' * _ 11 au. A 




33 


,tul ulv - uuv “ UU1 v » 

'oas in the scores of an individual, on squared deviation of X f scores from X,. 


mean 


Scanned by CamScanner 





258 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


, X(X,-*,) 2 

" elf 

A part of sf is due to the true variance (s 2 ) 
while its other part comes from the error 
variance (sj). The latter may arise from errors 
of measurement and is not the same as 
sampling errors. In absence of any correlation 
between s 2 and s 2 , 

OO ft “ 

1 2* 

.r si 

~ = 1 . 00 . 

s 7 s; 

The lower is the proportion of s 2 in sf, the 
closer is the true variance s 2 m to the total 
variance and consequently, the closer are X n 
and X r and the higher the reliability of thaT 
test. Indeed, r n is the proportion of £ in s 2 and 
can be considered as a self-correlation of a test. 

2 2 

j. _ __ | \ « . 

" ~ 7 ~ ” r ° r ’ Sf = s > 0 “ r »>- 
^ » 

lhus, an r (f of 0.70 indicates that and j’ 
constitute respectively 0.70 and 0.30 
proportions of sf. 

r ii su,lers from errors of measurement It 
ranges from -1.00 to +1.00, depending on the 
type and objective of the test as well as the 
nature of the group tested. It may lie in the 


0.50-0.60 range for a group within a 
range of school grades; but r tf should H 
the 0.90-0.95 range for individual d; - J 
and classification. Its magnitude and ui. jB 
sign are interpreted like those of Pearcr. • *1 

Reliability depends on the close asrccmJ 
between the obtained score (X ; ) and'th.^J 
score X^. The standard error of measur C 
(s e ) is the square root of the error variance*T 
and is thus another measure of random ern! J 

~ *7^ ~ • s e = y^T(l~7). 


When s e is zero or negligible, the test I 
perfectly reliable (r„ = l) and X, coincides vvl 
x - ; differences between ,Y ( scores arc thj 
solely due to genuine differences between 3 
scores. But when s, equals the square root (7 

of the total variance, the test has no reliability 
(r = 0). 


Whereas r, ( is a measure of selfcorrelatM 
of a test with itself, the index of reliability (rl 
is a measure ol correlation between the whole 
obtained score X , and its part (Xj which i s 
•he tro re. Thu., r,_ , , pan-whM 

correlation : r,_ = 77. r„ is a better mcasule 

for comparing the reliabilities of different test 
scores. 


Example 10.5.1. 


For a psychological test, 
proportional contribution to 

Solution : 


£ - Computt 


1 

their 


r „ = 0.75 ; 


*' ~ 10 - ••• *, 2 = 10 2 = 100 . 
% - s ; (1 - r tt ) = 100 (1 - 0.75) = 25. 


7 



•** S e ~ V7 = V25 = 5. 

S ” ~ r « s f 2 = 0-75 x 100 = 75. 


25 

100 ~ 0,25 ; 




C 75 





Scanned by CamScanner 












PSYCHOLOGICAL TEST CONSTRUCTION 


239 


potion of reliability 

Tbr cc measures of 


reliability, viz., 
0 f stability, coefficient of 
and coefficient of internal 


c ;i/fllence 

eqU stency. have been recognized by the 
co nSlS psychological Association. These are 
coefficients (r„) and arc estimated 
^ ^ve^ by the test-retest method, the 
fcSpe ^, c .f or ms or parallel-forms method, and 
^split-half method. The methods differ with 
S ^ t t0 ihe contributions of si and sj in their 
imputation. The basic relations arc given by 
1 following formulae : 

si 




or. r„= 1 - 


*T 


1. Test-retest method : 

In this method, the same test is repeated on 
ihe same group of individuals after a suitable 
interval and the two sets of scores are 
correlated for computing the coefficient oj 
stability or the*coefficient of retest reliability — 
the coefficient of stability determines the 
dependability of the measurement over a time 
interval. The error variance (j^) measured by 
this coefficient is an estimate of random 
variations of scores over a time interval due to 


r„ may al»u be enhanced due to the unchanged 
content which mines si. 

The tCHt-rctent method can be used for tests 
of psychomotor abilities, sensory discrimi¬ 
nations nnd such cognitive skills ns arc less 
affected by practice and memory effects. It is 
suitable for speed tests. It can also be applied 
to heterogeneous tests where items measure 
different abilities and have high correlations 
with a criterion, but u low correlation with 
other items of test, muking the reliability of 
internal consistency insignificant. 

Various types of reliabilities of ratings such 
as performance evaluation and personal 
assessment ratings can be estimated with the 
help of the coefficient of stability. This can be 
done by working out (/) Inter-rater reliability, 
i.e., the consistency of the results when the 
same individual is rated by two or more raters, 
and ( ii) rate-rerate reliability, i.e., the 
consistency of the results when the same 
individual is rated repeatedly by the same rater 
over a time gap. 


The coefficient of stability (r M ) gives no 
indication about the internal consistency of the 
test. The test-retest method is considered 
either uncontrolled testing conditions such as unsuitable for many psychological tests, and is 
change of sets and weather, poor visibility or used in such cases mainly in the absence of an 
audibility, noise and distractions, or personal alternate form of the test. 


factors such as lack of motivation, guess work, 
fatigue, boredom, anxiety and awareness about 
earlier mistakes. Maturity factors during the 
l ' me gap may also contribute to sr. Closeness 
°f the two sets of scores indicate a low s* e and 
a Wgh stability coefficient (r f/ ). 

h this method, an optimum time interval 
a ould be chosen for retesting. A short interval 
0w ^ a PP aren tly increase si to enhance r n 
the^ 10 memor y effect. On the contrary, if 

H&ticaT^ * S t0 ° l° n £’ differential rates of 
m ay anc * emotional changes of the testee 

r ; th 601 scores > raising s 2 e and lowering 
e tnterval should not exceed six months. 


2. Alternate-forms or parallel-forms method 

0 

In this method, two parallel or equivalent 
forms of a test are administered to the same 
group of individuals and the scores of the two 
tests are correlated for working out the 
coefficient of equivalence which gives the 
reliability of the original test. Equivalent or 
parallel forms of a test should possess an 
identical true variance ( s 2 J, similar item 
difficulties, similar item-total correlations, and 
independent error variances (sj) with no 
overlap. Criteria of parallelism for such 
alternate forms of a test include identical score 


Scanned by CamScanner 


260 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


distributions, identical mean scores, identical 
variances, perfectly matched contents, equal 
item-total correlations, same itcm-intcrcor- 
rclations, identical method of administration, 
and the same number, types and difficulty 
levels of test items. 

The equivalent-forms reliability may be 
expressed in two ways. 

(a) Coefficient of stability and equivalence : 
It is the correlation coefficient between two 
sets of scores obtained by administering two 
equivalent or parallel forms of a test to the 
same group on two different occasions 
separated by a time interval. It measures both 
equivalence of contents of the two tests and 
temporal stability, i.e., the dependability of 
measurements over a time interval. The error 
variance (s*) results here from both temporal 
variations of performance and variations of 
scores due to different sets of items in the two 
tests — the latter may be termed item 
specificity . 

(b) Coefficient of equivalence : It is the 
correlation coefficient between the scores of 
two equivalent tests administered simul¬ 
taneously or in immediate succession to the 
same group. It measures only the equivalence 
of the two tests and not the temporal stability. 
Here, sj results from variations of performance 
in diflerent sets ol items t or item specificity 
alone. 

This alternate or equivalent-forms method is 
highl) suitable for speed tests, but cannot serve 
for heterogeneous tests unless item- 
intercorrelations have been taken care of while 
developing the parallel tests. In practice, it is 
difficult to get precisely equivalent forms 
fulfilling all the criteria of parallelism ; the 
equivalent-forms reliability may be affected by 
any deviation from precise equivalence such as 
an overlap of error variances (.r), inequality of 
either s; or true variance (.£), variations of s 2 
owing to change in content of the alternate 


form, rise of because of different i» f J 
difficulties of the parallel forms of the te$j| 
fluctuations in testing environment and 
The time interval between the administrijti 
of the two tests, individual chang<d^| 
motivation, practice effects resulting frod^f 
use of similar item contents in the parallel 
forms, distractions, fatigue and boiedtiB 
influence the error variance (sr) to affcqt the 
equivalent-forms reliability just like the]test* 
retest reliability. But unlike the test-ietest 
method, the item contents are not identict^l 
only similar — on the two occasions ofl 
administration of the parallel tests n Us 
decreases the memory effect and the practice 
effect resulting from prior use. So,^™ 
alternate-forms method is frequently preferred 
to the test-retest method and used in a larger 
number of cases than the latter. 

3. Split-half method : 

In this method, a single form of test is 
administered to a group of individuals, but 
either the testing procedure or the scores is/are 
divided into two equivalent halves and fl 
obtained scores of the two halves are correlated 
to give the split-half reliability coefj'i (r hh ). 
This test is based on the assumption that 
different items of a test estimating any attri^™ 
should measure the same variable and should 
thus be internally consistent. So, the correlation 
coefficient between the scores of the half-tests 
acts as a measure of equivalence and consis¬ 
tency of the two half-tests. 

The entire test may be split into two 
parallel halves possessing items of equivalent 
difficulties and identical item correlations. The 
two halves are then administered as sep&^_ 
tests to the same group either simultaneously or 
in immediate succession. However, because 
reliability is a function of test length an 
increases with the latter so long asjil 
homogeneity is maintained, the split-hall 
correlation coefficient (r^) is extended by [ 
Spearman-Brown formula to give the ofl 


Scanned by CamScanner 


PSYCHOLOGICAL TEST CONSTRUCTION 


261 


is the reliability 
lt ^ ^ of-.! length, * is the 
<V»< » f 2 test is to be lengthened, 
^"'Reliability coefficient of the 
«, the Speamtan-Brown formula 

isir 

,,.1+5=1?* r » (l r “’ 

fo( ^ split-half method, r„ = r m ,k = 2, 
and fyt" r " 


speed tests if the splitting of test items tni* t 
halves is done in terms of both time and Mem 
difficulty ; the two halves can then be 
administered to the same group, one imme 
diately after the other, as two independently 
timed tests. 

The split-half method is not suitable lor 
heterogeneous tests because the items ol such a 
test cannot be grouped into two equivalent 
halves. But it may be used for homogeneous 
power tests in case of shortage of time and 
nonavailability of an alternate form of the test. 


*'* r " = 1 + 

flis extended r„ of the whole test is called 
t coefficient of internal consistency because it 
measures the equivalence of contents and 
hence, the consistency of the two half tests. It 
is higher than the tcst-rctcst r f - f and the 
equivalent-forms r„ because a simultaneous 
administration of both half-tests under identical 
conditions lowers substantially. But such 
simultaneous administration of both half-tots 
may increase the “true variance” and the r n 
falsely by causing the half-test scores suffer 
from errors in the same direction. Moreover, 
arbitrariness in splitting may add an element of 
Was to the computed r ff . 

This split-half method is eminently suitable 
for power tests where all candidates get enough 
fone to attempt every item (§ 10.3). In a power 
^ ^ith items arranged in an ascending order 
difficulty, an odd-even split is often done by 
^Panning the scores of items, bearing odd and 
t{ tn sef ial numbers, into two series to 
Pfesent separate half-tests, thus ensuring 
talent difficulties of the two halves. 

(1 ^ e internal consistency coefficient may be 
|L ln ^hng when different test items explore 

** attribute. 

hjgj i ° Wever . this method is not suitable for 
^ ^Peed tests where all examinees cannot 
PI all the items. But it can be used for 


4. Rational equivalence method : 

This is another method used to measure the 
internal (item-item) consistency of a homo¬ 
geneous test whose items arc all estimating a 
tingle specific attribute. The r |f computed by 
this method is also a coefficient of internal 
consistency and measures both homogeneity of 
test items and equivalence of contents. 

In the rational equivalence method, the 
entire test is administered at a time to a group 
of individuals. As splitting of the test is 
avoided, no bias is introduced in the computed 
r n owing to arbitrary grouping of items in 
separate halves. Assumptions for this method 
include similar item difficulties, dichotomy 
(e.g., yes/no, 1/0) of test items, uniform and 
high item-total correlations of scores, and the 
dependence of total variance ( sj) on item 
variances and covariances. Such assumptions 
prevent its application to heterogeneous and 
speed tests. 

Where n is the number of test items, sj is 
the variance of total test scores, p is the 
proportion of subjects answering an item 
correctly, q is the proportion of subjects 
answering it wrongly, and X pq is the sum of 
item variances given by the product of p and q 
for each test item, r n is computed by the 
Kuder-Richardson formula 20 (K-R 20 ) : 

A 


Scanned by CamScanner 





262 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


r " n -1 


(■-¥)• 


r„ ranges from 0 to 1.00 in value. Lo\y 0r 
and less accurate r n is given by Kiut 
Richardson formula 21 * using the mean 
K-R 20 is actually based on analysis oj score ^ : 
variance (§ 11.2). It is a measure of the 

average item intercorrelation of the test and an n . Y(m- Y) 

estimate of the average of split-half correlations r a ~ n - 1 

obtained from all possible splittings of the test 


Example 10.5.2. 

The reliability coefficient for a test of 20 items was found to be 0 48. What will be the reliability 
coefficient if the test is lengthened by 10 more iteir.. ’ 


Solution : 

= 0.48. 


..ft k . fa * 1101 ‘"•h 10 u M-UO 

initial text length J» JO 1 

, _ k’rr _ IS x 0.48 

““ i+a-iVo = i+os-do.48 * 0 58 


Example 10.5.3. 

Find the coefficient of intenul cooaaency for > whole tea whose split hdHoo 
Solution : 


are correlated by 0.52 


r hh = 0-52. 


. r _ 2/-^ 2x0.52 

* l + r tt ~ 1 + 0.52 = 0.684. 


Example 10.5.4. 

Use Kuder-Richardson formula 21 to com™.^ tK._• 

of 56 items where the mean and SD of all the'scores anx»« tot ** * tCSt coosistlt 

lu ana 10.5 respectively. 

Solution : 



" = 56: *=* = Ji=10 . 5; ... 


r n = - 

* n-1 


•*i = 00.5)2 = 110.25. 


! X(n-X) 
ns: 


^ = -*J,.32(56 - 32)1 
; J 56-1[ 56^U025 j = ° 892 * 


Scanned by CamScanner 
















PSYCHOLOGICAL TEST CONSTRUCTION 


263 


,0SS 

3 test consisting of 20 items, the sum of the item variances (Lpq) and the SD of all the test score % 
^nted respectively to 8.3725 and 6.5. Calculate the coefficient of internal consistency of the test bv 
£Richardson formula 20. 

Solution : 

s , = 6 5 ; sf = (6.S) 2 = 42.25. I pq = 83725. n = 20. 



Example 10.5.6. 

If a test consisting "I K has a reliability coefficient of should be made to have 

a reliability coefficient of 0.80 ? 

Solution : 

r xx ~ 0 45 ’• n « 20 ; r M * 0.80. 


. , _ raO-fg) 0,80(1-0.45) 

“ r^d-ra) " 0 45(1-0 80) * 49 ' 

So the test should lu- m s longer. Thus, the lengthened Ml riwold have the following number 

of test items : 

nk = 20 x 4.9 = 98 items. *» 


10.6 VALIDITY OF A TEST 

Validity is the capacity of a test to measure 
and predict the specific variable under 
investigation, in exclusion of other variables. It 
indicates (i) the relevance of the test to the 
variable or trait to be investigated, (ii) the 
discriminatory power of the test to exclude the 
measurement of other variables, and (/if) the 
predictive value of the test for only the specific 
h'ait. A test for clerical aptitude is valid if it 
c an predict success in clerical jobs, but gives 
no measure of any other variable like the 
aptitude for salesmanship. Validity is affected 

b y s ystematic errors of measurement (page 
251 j. 

Validity should not be generalized beyond 


the specific purpose or standard of a test. It 
needs empirical investigation. So, for 
constructing a valid test, (/) the attribute to be 
investigated should be precisely fixed, (it) the 
test items should be carefully chosen to reveal 
and measure that attribute in exclusion of 
others, and finally (iii) its validity should be 
determined by correlating the test scores with 
some external criterion. The latter may be 
either an objective measure of the same 
attribute or any other variable representing the 
latter. External criteria differ according to the 
purpose of the test, because different tests 
require different types of validity. External 
criteria for intelligence tests include school 
marks, teacher’s ratings, grades in public 
examinations, achievement test scores, and 





Scanned by CamScanner 









STATISTICS IN BIOLOGY AND PSYCHOLOGY 


similar bui more elaborate tests of established 
validity like Binet-Simon and Wechsler scales. 
Achievement tests are validated against the 
criteria of actual courses of study, analysed and 
chosen by experts. The criteria for personality 
tests are the actual forms of behaviour, case 
histories, clinician's reports and comparison of 
test scores before and after therapy. 

Validity is a measure of accuracy with 
which the test scores predict the variable to be 
explored by the test. It is ordinarily taken to 
be directly proportional to the magnitude of the 
validity coefficient (r xy or r n ) which is the 
correlation coefficient between the scores (X) of 
the given test and those (K) of an external 
criterion. Validity varies with the nature of the 
group tested ; a group with a wider ability 

range yields a higher r^ than a small selected 
group. 


Sometimes, different meanings of validity 
are expressed by terms like intrinsic and 
relevant validities. 


is the correlation coefficient of test scores wfl 
other measures, considering the commqS 
factors and eliminating the specific variance J 
the test scores. It is given by the square root <1 
common factor variance in the scores. 

Validity of a homogeneous test may bj 
enhanced by increasing the test length without 
disturbing the homogeneity, because such an 
elongation either increases the proportion of the 
true variance s£ or adds new factors to increasj 
the common factor loading of the test. Thw 
validity coefficient of the elongated test 
depends upon the reliability coefficient (r YV ) 0 f 
the test ot unit length, the validity coefficient 

( r KV or r *y) °f the same, and the number (it) of 
times the test has been elongated. 


r nua = 


k = 


1-r 


xx 


YX 


-r 


xx 


Y(kX) 



(a) Intrinsic validity indicates the capacity 

of test scores (X,) lo denote rnie scores (X ) It 
« gtcen by the intrinsic coefficient(r , 

hich is the square root of the reliability 
coefficient (r„) : r,„ = JT. 

(b) Relevant validity shows the extent to 
which a test measures the factors common with 
another test. The coefficient of relevant validt 


rir and nevcr a,iains thc perfcct sia «<= 

td. the nse in test length. The SE of estimate 
n n a '" mc “ urcs Ihe average random error 

test ret Ct ' ng ,ndlVldual criterion «ores from 
!he tes° 0W1 " S ,0 a " "" perfecl validity of 

S e = s yIi ~ r YX * 


Example 10.6.1. 

- - scores and lh 

—- ? h ° w maay dra “' *> -c * jasyssrs 


™ • r w = 0.55. 

W ~ “ el ° nSa,Cd “ *** ‘-s original ,en gth . 


Scanned by CamScanner 












PSYCHOLOGICAL TEST CONSTRUCTION 


265 


k = 2. 


r 


r YX 


0.64 


Y(k )0 




= 0.73. 


- + 0.55 


(b) 


-j-q gg( 3 validity coefficient ^Y{kX) ^ 0.80, 


k = 


I -r 


XX 


1-0.55 


’YX 


r xx 


Y(kX) 


(0.64) 2 
(0.80) 2 


= 5. 


-0.55 


So, the test should be lengthened to 5 times its original length for getting a validity of 0.80. 


Types of validity 

Three types of validity, viz., content validity, 
construct validity and criterion-related validity, 
have been recognized by the American 
Psychological Association. In addition to these, 
a fourth type of validity, called the job- 
componcnt validity, has also been recognized in 
industrial psychology. The type of validity to 
be estimated is determined by the purpose of 
the test. 


between the test scores (predictor) and some 
independent criterion for the pertinent variable. 
In work psychology, content validity is 
estimated while validating the tests for job 
performance. For this, the test items are 
examined, before actual application of the test, 
to find how far their contents are appropriate 
for the selection of personnel for the job. But 
content validity is not applicable to personality 
tests. 


1. Content validity : 

It is based on the logical analysis and 
proper sampling of the contents of the test. It is 
estimated to ensure the relevance of both the 
individual test items and the total test contents 
to the behavioral domain under consideration. It 
also investigates whether different aspects of 
t e relevant behaviour are assessed in correct 
proportions by the test items, and how far 
! ose ** ems and the cognitive processes 
involved form a representative sample of the 
'anable to be measured. 

« 

o j. ^ onte, it validity thus assesses the suitability 

irur - CSt m evaluatin g lhe existing status of an 
ividual in a specific area of behaviour. It is 
ormally used for educational achievement tests 
V computing the correlation coefficient 

34 


Content validity is worked out in the 
following ways. 


(») When both the test scores (X) and the 
criterion scores (T) constitute continuous 
variables, content validity coefficient is given by 
the product-moment r XY between the two. 




,. wllciaun S continuous test score 
with a dichotomized criterion, either poin 

to the ° r blSenal r b ,s computed accordin 
to the genuine or arbitrary nature of th 

1C o omy of the criterion (pages 173-180). 
(in) When both the criterion and the te> 

or°;::ra?hot h0t0mi2ed ’ ei,her ” hi 

j. r 1S c °niputed between ther 

according to the genuine or arb , ‘^er 

the dichotomy (pages 181-185,. 

(iv) In case of more than one test as th 


. 




Scanned by CamScanner 














2 66 


STATIST ICS IN BIOLOGY AND PSYCHOLOGY 



predictor*, validity in given by the multiple moderately high positive correlation with 
correlation coefficient ([{. „ j between the scores in other tests known to measure the 
combined test scores and the criterion (pages same trait satisfactorily ( convergent validity or 
158-I6J; negligible or negative correlation with the 

scores in tests for dissimilar traits {discrinui^^k 
A test should not be taken as valid merely validily) For example, the construct validity 0 f 

because its contents seem apparently to af) inle jj jgcncc test is ensured either by the 
measure the ability being investigated. Such an signjfican( i y high positive correlation of its 
■ t pcarance, often termed face validity, is wi|h 0 f already proven intdligcnoj 

considered as only one aspect of content ^ Qr sometimeSf b y their poor correlation 
/aridity ; it cannot be subjected to any those of musical or other special ability 

statistical treatment and may merely serve in Correlation is computed here between the 

motivating the candidate as well as the &COTCS 0 f t he test under investigation and more 
investigator, and is sought for this purpose in ^ ^ cr/Ver/ ^. 
professional selection tests and achievement 

tests of adults. The construct validity coefficient depends pn 

the sharing of some common factors by the test 
2, Construct validity : and the criteria ; hence, factor analysis (§ 10.8) 

Psychological construct is a hypothetical may be undertaken for construct validity, 
concept of such an abstract quality or attribute 


like intelligence, emotional stability, mechanical 
ability and inner drive, as cannot be observed 
directly, but can be measured from the 
behavioral expressions. 


3. Criterion-related validity : 

This includes predictive and concurrent 
validities. For both, scores of the test 
(predictor) are usually correlated with scores of 
an independent criterion by computing eitifl 
Construct validity is estimated to assess the Spearman's rfio or Person's product-moment § 
suitability of a test in evaluating the status of a 

In validating tests for personnel selection,j a 
measure of job performance is usually chosen 
as the criterion. 


person in a given abstract area of behaviour 
called a psychological construct It may also be 
worked out for attitude scales and personality 
tests. 


Construct validity may also be estimated for 


(a) Predictive validity : It is given by the 
correlation between the scores of the test for 


the validation of ratings such as those of job P redictiD S a particular trait, and those of a 
characteristics ; however, an appropriate criterion that is a measure of a specific 
external criterion is often hard to get for the su ^ se( l uent performance. Its main objective is a 
construct validation of ratings. prediction of future performance. Selection or 

# classification of candidates can be done on the 

******* each test * tem basis of test scores supposedly reflecting the 
l»e ttoen wtth a clear understanding of future performance. For example, the scholasti 

detect and meauin ■ a> 10 re ^ ect - aptitude test for admission to an academic 

detect and m easure the latter The »>rfivmn n ^ - 

in the test is .ubsequemlj^d^TTta ™ U . taUon validated against subsequent 
agreement with the psychological^ Jr, Lr< f R ' C achievements in the school. The 
Construct validity is ensured ff Z JT' t ^ CUVe Vali , di * of a test for personnel 

the given test have either a si^ifj'cam ^ leCtion “ worked 0UI b >' correlating the ted 

scores, collected at the time of appointment* 


Scanned by CamScanner 
















PSYCHOLOGICAL TEST CONSTRUCTION 


267 


„„s of a job performance test 
tb e 5C f r ' b seqoenUv. The accuracy of 
indicated by the coefficient oj 
H/iity which is the correlation 
tbe test scores and the 
Lrffi cie0t nres. Expectancy tables are 
cri‘ erioD *Ld for predictive validity (§ 10.7). 
^moutanon of predicthe validity m^lves 
Tbe con^pu because validation is done 

a {bllo*- u P criterion of subsequent 

ber e perfect predictive validity, the 

f‘ rf0 ‘Tshould be highly reliable and selt- 
aiB ?°ed. enabling the generalization tram the 
one variable to those of another. For 
iCOR \elecuon tests, a moderate correlation 
tbe purpose. A test for predictive validity 
haVgood content and construct 

validities for ta V*** appUcarioo. 

m Concurrent validity : '< « P™» * !* 

JlLon of a test with an ex,mng cntenon 
C f say job performance instead of a 

subsequent^ performance . So. it needs 

and labour. For example, the con ^^‘^ 
of an arithmetic test is given by the correlaDO* 
coefficient between the test scores and 
already achieved class grades of the tested 
individuals. Similarly, the scores of any 
intelligence test can be validated against the 
scores already obtained in the Stanford- Bmet 
scale. Thus, concurrent validity shows the 
validity of the test as a measure of tbe present 
status of an individual while predictive validity 
assesses the test as a measure oi his juture 
status. 

When both the predictor and the cntenon 
are perfectly reliable, criterion-related validities 
tend to reach maximum magnitudes, 
approaching but rarely attaining the square roo. 
of the reliability coefficient of the tesL 

4. Job-component validity : 

The concept of this validity is based on the 
gumption that the requirements for a given 
job should have some components common or 


comparable with those r cle' ant 

nance in job ski u. it refers to 

component is a spec e5t imated as 

comem vttlidity cl attr ibute. on the 

SOCh - In "riv i esrimated as construct 

contrary, it may oe 

reliabilitv and validity 
Relation between ******* ^ cM|n0t 

(a) A test is far tro ® ' Uable test should 
measure a trait ie ? Wy ^ tacaBi B proper item 
also be theoreocally '^id ^ ] e ad to 

selection and elongatio var iances and 

increased proportions o f actor loadings, 

hence, to enhanced common factor 
However, reliability docs not alwa> 
validity in practice. 

» 'validTd^ 

;r,c'aT;: the common factor 
which, however. form a component o 


(c) Reliability is a measure of scl 
correlation of a test and depends o- h 
,dcnucai difficulty levels and high internal 
consistency of test items. But predictive vahd.ty 
depends on differing difficulty levels of test 
items and their low correlations. So, both 
reliability and validity may not be uniform y 
high for a tesL 

(d) Lengthening a test increases the 
proportion of common factor variances and 
hence, the proportion of true variances also. 
Rise in the number of common factors, shared 
by the test and the criterion, increases the 
validity w hile the rise in the proportion of true 
variance enhances reliability. But unlike 
reliability, validity cannot be raised to the 
maximum level by the increase in the test 
length alone. 

(e) Reliability may be low in a hetero¬ 
geneous test with its items measuring different 

O 

factors, but a high predictive validity may still 




Scanned by CamScanner 





268 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


result from the common factors shared by the 
test items and the criterion. Reliability may be 
high in a homogeneous test with high internal 
consistency. But validity may often still be low 
due to the dearth of common factors for the 
items and the criterion. 

Tests should possess sufficient validities 
along with reasonable degrees of reliability. A 
test serves no purpose if it gives consistent 
results on repetition, but fails to measure the 
desired trait. Both reliability and validity may 
be reasonably ensured by replacing a single test 
with a test battery of heterogeneous type, but 
consisting of individual tests of homogeneous 
natures. 

10.7 EXPECTANCY TABLE 

An expectancy table is constructed from a 
scatter diagram or correlation table giving the 
bivariate distribution of two continuous 
variables (pages 150-152). It expresses the 
relationship between the test scores (predictor) 
and the scores of the criterion. It indicates the 
probability of attaining a particular grade or 
class of performance when the test scores are 
known. Where both the test scores and the 
criterion scores constitute continuous variables, 
the paired scores of the two for the tested 
individuals are arranged in a two-way bivariate 


frequency distribution which is used as tl 
correlation table (pages 150-152). Each cel 
frequency of such a correlation table or scatte 
diagram (Table 10.5) is converted to 
percentage of the marginal total f r of the ro\ 
to which that cell belongs (Table 10.6). 
Expectancy tables may be constructed eve^ 
with more than one predictor, when necessc. 
or with two classes of a dichotomized criterion 

10.8 FACTOR ANALYSIS 

Factor analysis is the right statistical 
procedure for construct validity of tests, [f 
scores of a test correlate highly with scores of 
another similar and already validated test or 
criterion measuring the same attribute, the two 
tests may have common factors shared by their 
scores (convergent validity ). The poor 
correlation between the scores of two tests, 
measuring dissimilar attributes, may result from 
their poor loading with common factors 
0 discriminant validity ). Thus, factor analysis can , 
explain the intercorrelations between tests by 
analyzing the common factors. 

Factor analysis identifies the factors or 
psychological components in a test, assesses 
their relative independence, correlates them 
individually to a criterion for determining their 
individual contributions to the total test scores, 



Scanned by CamScanner 















PSYCHOLOGICAL TEST CONSTRUCTION 


269 



the weights of factors responsible 
variance of scores in two tests. 
J* ^ for common factors, shared by 

fc ^ ^ responsible for their correlation. 

^ factors can explain the inter- 
between a group of tests. For 
for arithmetic problems, number 
addition and multiplication may 
positive correlations with each other 
s common ‘numerical factor’ in all of 
^ ' but these tests show low and 

T^jgcaot correlations with vocabulary and 
completion tests, loaded with a ‘verbal 
^^iostead of the ‘numerical’ one. Factor 
*VL * given by the loading of a test with 
S^'and its correlation with each factor. So, 
. , v /analysis validates a test in the form of 
loading or intercorrelation between each 
jy. jed the factor. It is usually applied to the 
m of interdependent variables, not 
ifijrcntiated into dependent and independent 
_ u identifies the most specific ability 
jawed for a particular task and reduces the 
jssber of original test items to a smaller 
juccer of common factors sufficient to explain 
ic correlation between the tests. 

A good criterion can be ensured by its 
factor "analysis. This intercorrelates criterion 
seasnres among themselves as also correlates 


them with validated tests of anticipated 
common factors. Investigations of factor 
loadings of the criterion help to choose the 
most relevant and representative criterion, to 
include the most predictive tests in a test 
battery, to give weights to criteria for 
combinations, and to detect the insufficiency of 
a proposed test battery in predicting any factor 
of the criterion. 

Of the numerous theorems of factor theory, 
theorems I and II suffice to explain validity. 

Theorem / of factor theory : 

True variance (si) of a test score is the sum 

of (i) common factor variances (a*, s~ b .. •'*) 

for the common factors (a, b, . k) shared by 

many tests, and (it) a specific factor variance 
(jJ) of the given test, which is shared only by 
its parallel and equivalent forms. 

si = + 4 + -• + 4 > + 4 - 

The total variance s f equals the sum of .s„ 
and the error variance sf. Where a\, b\, etc., 
are the proportions of sf due to the common 
factors, 4 or specificity is the proportion of i f 
in sf, and 4 is the proportion of a, in s r 

sf= d + sf 

= (4 + 4 + *"• + 4 + 4 ) + 4 * 


mathematics on algebra test scores of Table 10.5. 





Scanned by CamScanner 









270 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


or, 1.00 = 


i + i+ +i+ — 

4*4 j} t. 


4 

3 / 


= (4 + 4 + •••• + 4 + 4) + 4 '♦ 

or, i - 4 = (4 + 4 + •••■ + 4) + 4 ; 


or, 


4 = (4 + 4 + + 4) + 4- 

s t 


Where /i^ or communality is the sum of 
proportions of common factor variances in the 
test scores, and uj, or uniqueness is the sum of 
proportions of specific and error variances in 
the test scores, 


r „ - ~f = 4 + 4 + + 4 + 4 • 

or, r lt - s x = 4 = 4 + 4 + + 4 » 

..2 ,2 -2 

. i,2 _ i«. _ ^2 - 12 . _ -fc, and 

• • n x 2 x 2 t 2 


4 = i - 4=4 + 4- 

Fac/or loadings are the square roots of the 
proportions of common factor variances ; ea».h 
factor loading, viz., a x , b x , etc., is the 
correlation coefficient between the relevant 
common factor and the total test score. This 
correlation coefficient, called the factor validity, 
is an estimate of the capacity of the test to 
measure the trait underlying the given factor. 

Theorem II of factor theory : 

Validity coefficient is the correlation 
coefficient (r^ or r n ) between the test (X) and 
either an external criterion or another validated 
test (10- If equals the sum of crossproducts of 
common factor loadings of X and Y , because 
r results from common factors shared by 
them and amounts to 0 in absence of such 
common factors. Construct validity coefficient 
can thus be computed from factorial validites 
or common factor loadings. 

or ryx - a x°Y + 44 + •••• + 44 


10.9 STANDARDIZATION AND NOR\J 

The final stage in test construction con 
of the standardization of the test. A test has 
been standardized if 0) its items have been 
properly analyzed and chosen, (ii) procedures 
of administration and scoring have been made 
uniform, (in) instructions for its application as 
also the scoring keys have been provided, and 
(fv) norms have been established and tabulate* 
for interpreting the test scores. 

A raw score can be made meaningful and 
significant by comparing it with a standard. For 
example, numerical scores of 125, 137 and 144 
in the US Army Alpha Group Intelligence Test 
indicate only the relative positions of the tested! 
individuals, and acquire real significance only 
when judged against the score distribution of a 
representative sample. To achieve this, the 
establishment of a norm is essential for each j 
sample. 

A norm is the average score of a rcpresdB 
tativc group in terms of a convenient scald of 
converted or transformed scores. Raw ^oral 
arc, therefore, frequently converted into 
transformed scores by either linear or nonline* 
transformation. Transformed scores help the 
investigator to compare an individual J 
performance with those of other persons and 
with his owm performance in different tests by 
rendering his relative positions clea* 
comparable with the standardization sample. 
Usually expressed in the same unit for din-rent I 
tests, transformed scores as well as the® 
means, dispersions and forms of distributior aie 
comparable. The essential criterion fi* 
establishing norms is the representative natafe 
of the sample rather than its size. Norms are 
limited to the particular group or popular^! 
from which the representative sample 
drawn when establishing the norm. Lo^H 
norms, group norms, regional norms or naiioj^B 
norms are established separately according! 
the nature of the sample. Even the establ: 
norms may get outdated and need updating 


Scanned by CamScanner 










I 


PSYCHOLOGICAL TEST CONSTRUCTION 271 


^reformations of raw scores 
w scores belong to an arbitrary scale. So, 
^ n eed transformation into a suitable 
S W n scale for gaining a standard meaning, 
r ^^mon reference value and a comparability 
A aC0 ot l ie r test scores. Raw scores need 
1 WI, s formation also for ensuring additivity of 
' ^(reatment effect, and for making the data 
® a lly distributed and homoscedastic 
(homogeneous with respect to variances) — 
\ ^ ma y fulfil the assumptions for statistical 

'i ts uke the t test and anova. Raw scores of 
psychological tpsts are frequently converted to 
*1 (a) percentile scores, (b) age scores, (c) ratio 

k IQ and (d) standard scores, and their variations 
>1 jjfcc z scores, T scores, C scores, stanincs and 
'* deviation IQ. 


.' Linear transformations : 

Linear transfomiations of raw scores merely 
change the zero point of the scale and/or the 
unit of measurement. So, they change the mean 
and SD of the raw scores without altering the 
original shape and properties like skewness and 
kurtosis of their distribution. The difference s 
between raw scores correspond closel) in 
relative magnitude to those between the 
respective linearly transformed scores (§ 5.4). 
The z and t scores are examples of linearly 
transformed scores. 



Nonlinear transformations : 

Nonlinear transformations change not only 
toe mean and the SD, but also the shape, 
skewness and kurtosis of the original raw score 
distribution. They are often tried for converting 
an on-normal distribution of raw scores into a 
formal distribution of transformed scores, 
gentile scores, T scores, C scores, stanines 

ptouli meata * a ^ e scores are examples-of 
ita,„L D f ar ^Bsformations of raw scores in 

Psychology, 

®*ttitile or ceiitile scale 

ers,0n of raw scores into percentile 


ranks or PR (page 42) is a form of nonlinear 
transformation frequently used during the 
standardization of psychological test scores. 
The percentile scale is a transformed ordinal 
scale. It gives a rectangular form to the 
frequency distribution of the relevant variable 
without affecting the rank orders of original 
scores. It expresses individual scores in terms 
of a typical sample of 100. Each interval ot 
this scale contains an equal number of cases. A 
percentile point is a value, below which lies 
the stipulated percentage of cases (page 40). It 
70% of the individuals of a sample have scored 
below 40 in a test, then 40 is the 70th 
percentile point while 70 is the PR of the score 
40 (page 42). Percentile points of the original 
scale arc transformed into the corresponding 
PR values to make the scores of an individual 
in different tests comparable to each other. 
Thus, a relative measure of his status in 
different traits can be obtained from his PR 
values in different tests. 

Advantages : The percentile scale is 

(a) easy to compute, comprehend and interpret, 

( b ) applicable to any type of ability or 
personality test without complicated statistics, 
and (c) applicable to both normal and non- 
normal distributions. 

Limitations : This scale (a) shows only the 
relative rank of an individual in a representative 
sample and cannot indicate the actual difference 
between the test scores of two individuals, ( b) 
is unsuitable for small samples, (c) is not 
useful for many subsequent statistical compu¬ 
tations, and ( d) shows identical differences in o 
units as unequal differences (Fig. 10.2). For 
normally distributed raw scores, the difference 
between the PRs of 50 and 16 amounts to 34, 
but that between the PRs of 16 and 2 amounts 
to 14 only, although both differences represent 
a difference of 1 cr. This is because larger 
differences near the tail ends of the normal 
curve are reduced and smaller differences near 
its centre are exaggerated in the percentile 


Scanned by CamScanner 


» \ 


SVUINVU'N IN IMOlvHtN VNl' KSYCMOl.OO\ 


l °D| 


Sv'ak\ IV.W liUUtatMUS v'Uv'U Utah* Ikv' £ *iWW 
preferable to f'A\ 

Ag* SWVN 

The commonly known age score is the 
mental age score of Bmet'Simou Age Seale ol 
Intelligence (190$) and its revisions* Here, the 
test items are specified for each ago lev el and 
arranged in an ascending order of difficulty. 

The test measures the intellectual level of a - % . . . , ^ 

child m terms ,'t the .nei.ige elm'iu'logical age >'! , u-'ponding OgC CVe in t u normatj 

of normal children with the same intellectual sample, 
standard. The intelligence level of the child is Ratio IQ 

expressed M Ihe mrni.il agr. Ihc Utter is Mentt | nge score itself carries a sicoj. 

detmed as the age level ol normal children ^ „„ (f) ,, sh ows only the level .5 

whose average performance in the test 
coincides with that of the tested child. If a 


items, is called the ceiling or apical 
sum of the basal age and the addit 
fractional credits gives his mental «g r 

\gc ih’iiun arc differently Used U1 n I 
other tests, not arranged this way accord, J 
age levels. Scores are awarded there either I 
ill.- total mnulvi ol items passed o, on tlv ( 0 j| 
time taken. The mental age is then fixed *3 
reference to the mean score of average child,™ 


child, aged 8, answers successfully all the test 
items placed at the age level 8. (i.e., the items 
correctly answered by a large number of 
normal children of 8 years during the grading 
of items), then he is assigned an initial level of 
8 years. The child is then assigned an 
additional credit of 1 year each, for passing all 


ficance. But (i) it shows only the level 0 f 
< tool nitwit) of .i child and c.m«J 
differentiate between average, bright and dull 
children because of no reference to their 
chronological age. (ii) Hie unit docs not renin 
stable with advancing age — the difference■ 
mental age between earlier years of life j s 
greater than that between the later years, 
because the rate of intellectual dcvelopmeni 


the items at each successive higher age level ; P ro g rcss i' c jy declines with age. To remedy 

but no partial credit is given to‘him for Umiuilions ‘ Tcrman and Merril1 convert fl 

passing less than all the items at any age level. dlc mcn,a * a ^ L score into a ratio called if 
The sum of the initial level and all the diligence quotient (IQ) or ratio IQ in theii 
additional credits gives his mental age in the Stanford revision (1916) of the Binet scale.J 
1908 scale and its 1911 revision. The average ^ here MA and CA represent respectively the 
child of each age group has the same mental menta * a £> e a °d the chronological age ol a 
age as his chronological age. This original 
procedure was, however. Considered unsound 
because of the absence of partial credits for IQ = 

part successes at any age level. 

In the 1916 Stanford Revision of Binet 


MA 

CA 


x 100. 


IQ is a quotient norm or ratio norm which 

Scale by L. M. Terman and Maud Merrill, P 61111 * 15 die interpretation of mental age scores 

provisions were made for the award of partial re ^ atlve t0 die chronological age. It thus place* 

credits. The highest age level, where the child 1° a correct perspective with regard to 

passed all the test items, is taken as the starting chrono logical age. Evidently, 100 will be the 

point or basal age for that child. Then, a avera S e IQ for each age group. IQs above and 

•stipulated number of months is credited 10 him below 100 indicate respectively the acceleration I 

for each item passed above that level. The age and the retar dation of intellectual performances! 

level, where the child can answer none of ih*» ,a • ,. 

none ot the IQ 1S a nonlinearly transformed age scc*\ 


Scanned by CamScanner 




PSYCHOLOGICAL TEST CONSTRUCTION 


273 


^sfannation approximately normalizes 

^ e distribution with a mean of 100 and 

the s c0l j - t t jj U s changes the original rank 
s p°i 

0 f mental age scores as well as the 

otdetS CD and shape of the original mental age 
mean- > u 

jistribution. 


limitations : ( a ) Age scores including IQ 
0 t assess other traits which do not change 
^jth the chronological age. ( b ) Use of the 
'hronological age in its computation poses a 
roblem in determining IQs of older people 
with supposedly sloped intellectual develop¬ 
ments — IQ g° es on declining with age as 
mental age attains its peak at maturity. To 
avoid this, 16 was fixed as the highest age with 
complete intellectual growth. But IQ may still 
fluctuate, (c) IQ as defined in the 1937 revision 
of Stanford-Binet scale shows agcwisc and 
lestwise variations of its SD. This may be 
remedied by using either a correlation table for 
each age level or, still better, the deviation IQ 


discussed later. 


Standard scores 

Standard scores are either linearly or 
nonlinearly transformed raw scores. The 
simplest form of linearly derived standard score 
is the z score (pag£ 74). Being a form of linear 
transformation, z scores bear the same 
numerical relations and relative differences 
between each other as those between the raw 
scores from which they have been derived. 
Transformation into z scores cannot, therefore, 
normalize a non-normal raw score distribution 
^d only alters the mean and SD of the latter. 

There are two limitations of z. (i) H raw 
score below the mean gives a negative z score. 
00 The unit of \o is relatively large and often 
•ntroduces decimals in the computed z. To 
®void these limitations, transformed z scores 
or proper standard scores are computed by 
Plying z scores with a constant number 
c higher than 1, and either adding the 



product to or subtracting it from another 
constant much higher than 0. For such further 
transformations, the mean is usually taken as 
50 or 500, and the SD as 3, 10, 15 or 50. 

Such transformed standard scores include the 7 
scores (X, = 50, s s = 10), the Scholastic 

Aptitude Test (SAT) scores (X, = 500. 
s s = 100), the Graduate Record Examinations 
(GRE) scores (X, = 500, s s — 100), Wechsler 
Block Design sub-test norms (X, — 10, r, =^3), 

Differential Aptitude Test (DAT) scores (X, - 
50, s s = 10), and Wechsler deviation IQ scores 

(X, = 100, s s - 15). For the SAT scores, for 
example, 

r' = X, + S t z = 500 + lOOz. 

Standard scores arc converted to normalized 
standard scores by transforming raw scores 
into equivalent points on a normal distribution. 
Each such point corresponds to the same level 
of the variable as the respective raw score. For 
example, raw scores are converted to percentile 
ranks, and the latter to the corresponding z 
scores of a normal distribution. Normalized 
standard scores have the same mean of 0 and 
SD of 1 as those of the linearly derived 
standard scores. The normalized standard score 
can be transformed into T score, C score, 
stanine, deviation IQ, etc., by multiplying it 
with a convenient number and adding/ 
subtracting that product to/ffom another desired 
number. 

T scale 

i T scores are normalized standard scores 
computed by the nonlinear transformation of 
raw scores. The transformation of raw scores to 
T scores changes even a non-normal raw score 
distribution into a normal one, but leaves the 
rank orders of the scores unaltered (Fig. 10.2). 
The centre (mean) and SD of the T score 
distribution amount to 50 and 10 respectively. 
The T scale ranges from 0 (-5<r) to 100 (+5 a) 


Scanned by CamScanner 




STATISTICS IN BIOLOGY AND PSYCHOLOGY 


274 ♦ 


and the unit of measurement amounts to 0.1 o. 

Computation of T scores : 

(a) The raw scores are grouped into class 
intervals whose midpoints ( X c ) are also 
computed (page 16). The frequencies If) of raw 
scores are entered in the respective class 
intervals (Table 10.7). 

( b ) For each class interval, the cumulative 
frequency ( cf) upto its lower limit ( X,) is 
computed by adding the frequencies of a.’l class 
intervals below it ; for example, for the 
interval 36-40 of Table 10.7, cf= 3 + 8 = 11 . 

(c) For each interval, the c/upto its X is 
computed by adding half the frequency of that 

_ interval to the cf upto its X r For example, the 
cf upto the X c of 36-40 amounts to : 11 + Vi x 
12 =17 (Table 10 . 7 ). 

(d) For each interval, the cumulative 
proportion ( cp ) upto its X c is computed by 
dividing the cf upto its X c with the sample size 
n. For example, for the class interval 36-40 of 
Table 10.7, cp = 17.0 -i- 100 = 0.17. The cp 
represents the corresponding PR expressed as a 
proportion. 

(e) 0.50 is deducted from each cp and the z 
score for that obtained “area” is recorded from 


Table A of unit normal curve 2 
Appendix. This z score would coni--** I 
upper limit of the fractional area r -{ 
the corresponding cp. For exampT'JfH 
deducted from the cp of 0.3050 1 

of -0.1950 ; the z score correspondin'' M 
fractional area of -0.1950 amount l ^ 
(Table A). The z scores for all the c 7 
are found out in this way and entered * 
the respective midpoints. 

(f) Each z score is then converted .. 
corresponding T score as follows : J 


T = 50 + lOz. 

For example, for the z score of -0.5] 

T = 50 + 10c = 50 + 10 x (- 0.51) = 44 9 

T scores are obtained in this way for. 
midpoints of the class interv al*; 


Advantages of T scale : (a) T scale ^ 
change some non-normal raw score distri¬ 
butions into normal distributions, lb) It leg-.*; 
the rank order of the raw scores un al tered. .7 
Its mean, SD and unit are convenient for cc* 
it even beyond the given range of tb 
population. 

Limitations of T scale : (a) Its unit of O.lr 


Table 10.7. Computation of T scores for a raw score distribution. 


Class 

intervals 

K 

/ 

cf 

upto X t 

cf 

upto X c 

cp 

upto X c 

Z 

• T 

66-70 

68 

2 

• 98 

99.0 

0.990 

+ 2.33 

733 

61-65 

63 

5 

93 

95.5 

0.955 

+ 1.70 

67.0 

56-60 

58 

10 

83 

88.0 

0.880 

+ 1.18 

61-8 

51-55 

53 

20 

63 

73.0 

0.730 

+ 0.61 

56.1 

46-50 

48 

25 

38 

50.5 

0.505 

+ 0.01 

50.1 

41-45 

43 

15 

23 

30.5 

0.305 

— 0.51 

44.9 

36-40 

38 

12 

11 

17.0 

0.170 

-0.95 

403 

31-35 

33 

8 

3 

7.0 

0.070 

- 1.48 

353 

26-30 

28 

3 

0 

1.5 

. 0.015 

-2.17 

2 

Total 


100 (n) 




- 




Scanned by CamScanner 









r PSYCHOLOGICAL TEST CONSTRUCTION 


275 



2 * 3 ' 4 ' 5 ' 6 ' 7 
C scale 



2 ' 3 • 4 ' 5 ' 6 ' 7 
Stanines 


84 100 116 

Deviation 10 
(SD= 16) 



Cumulative percentages 


" s kkkkkki k * 

c- Percentiles 

10 . 2 . 

» * ' ' 0ni dlffe,enl ' yPeS ° f S,anllard *““• CUraUlMivc P««"***« and percentiles. 

J) h has a hLhTw purp0se 0f many te8t8 ' from its unit. 

difficult to d ' . of measuremcnt * making it 

Ug ‘ n 8 0 1 lSUngu * sl1 between individuals by ' C scale 

141 *•'^cu'rtTn (C> The SCalC H* C »ca!e is a„o,her scale of „ 

accurate measurements as appears standard scares. Because its u„U s ,nT 

w.5cr) are 


Scanned by CamScanner 









276 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


coarser than those of the T scale, it is used scale is not advisable for selection (cm 
when less refinement may serve the purpose. It (d) Like other normalized standaid scores, it 
ranges from 0 to 10, the lowest and the highest C scale should be used only when the noiJ 


categories bearing respectively the scores of 0 
and 10. The scale, therefore, has 11 units with 
the mean at 5.0 and the SD of 2. The highest 
1% of cases will thus have the C score of 10, 
next 3% will have a score of 9, and so on ; the 
lowest 1 % of cases will get a score of 0 (Table 
10 . 8 ). 


normality of the distribution has resulted frcl 
defects in the test and is not an inherent 
characteristic of the given variable in the 
population. 

Stanine 

The stanine (standard nine) scale was 
extensively used by the US Air Force during 
the second world war for nonlinear 
transformation of raw test scores into normal¬ 
ized standard scores. This scale is a somewhat 


The C scale finds application in the 
nonlinear transformation of a non-normal 

distribution of raw test scores into a normal _ 

distribution. The C scale is usually derived condensed form of the C scale and is divided! 
graphically from the cumulative frequencies of into 9 units with scores from 1 to 9 along Hid 
class intervals, using percentile ranks and unit abscissa of the normal curve. This removes tliej 
normal curve areas. inconvenience caused by the 0 score of the CJ 

scale. The highest and the lowest categories of] 
Advantages : (a) The C scale normalizes a ^ sca j e> bearing the 10 and 0 

non-normal distribution. ( b) It facilitates are merged in the stanmc seal! 

discrimination between individuals in many ^ adjacent categories having scores <>| 9 

tests because of its lower SE of measurement ^ J respcclivc|y (Tab , c , 0 9) Thc (wo s , ak . s 

compared to the T scale. have the same mean of 5.0 ; but the SD of the 

Limitations : (a) Its coarser unit reduces its stanine distribution is slightly lower and! 
efficacy, (b) The 0 score of the scale makes it amounts to 1.96 because of the merger of 
unsuitable for guidance and counselling categories at either end of the stanine scale. 
purposes, (c) Because the shifting of 1 unit Like the C scale, one stanine unit extends over 
considerably changes the percentage, the C 0.5(7 along the abscissa of thc normal curve. 


Table 10.8. The eleven-point C scale. 


C scores 


Intervals in c scores 


Percentile rank limits Percentages of cases 


10 

+ 2.25(7 to + 2.75c 

99.7 

0.9- 1 

9 

+ 1.75c to + 2.25c 

98.8 

2.8 * 3 

8 

+ 1.25c to + 1.75c 

89.4 

6.6 - 7 

7 

+ 0.75c to + 1.25c 

77.3 

12.1 - 12 

6 

+ 0.25c to + 0.75c 

59.9 

17.4 ^ 17 

5 

- 0.25c to + 0.25c 

40.1 

19.8 - 20 

4 

-0.75c. to -0.25c 

22.7 

17.4 -17 

3 

-1.25c to-0.75c 

10.6 

12.1 - 12 

2 

-1.75c to -1.25c 

4.0 

6.6 - 7 

1 

-2.25c to-1.75c 

1.2 

2.8 - 3 

0 

from below -2.75c to -2.25c 

0.3 

0.9 - 1 


Scanned by CamScanner 












277 


i' PSYCHOLOGICAL TEST CONSTRUCTION 


•fte percentage of cases in the 0.5 o interval 
falls with bicrease tn the distance of the 
stanine score from the mean. 

The raw scores of a test can be readily 
transformed into stanines. Table 10.10 
summarizes the assignment of stanine scores to 


. the raw test scores of a sample of 25 persons. 

, Only one individual obtained the highest test 
score, thus forming the topmost 4% of the 25 
. cases, and is assigned the stanine score of 9/ 
' Similarly, a single individual obtained the 
lowest test score, thus forming the lowest 4% 
*. of the cases and is assigned the stanine score 
of 1 . The test scores immediately around the 
mean raw score were obtained by 5 persons 
who constitute the central 20% of cases and 
ll get the stanine score of 5. Two persons. 
. forming 7% of the cases, obtained test scores 
( immediately below the top 4% and arc 
assigned the stanine score of 8. and so on. 


Advantages : (a) The stanine transforms a 
R non-normal distribution of raw scores into an 
I approximately normal distribution, (b) Because 
l of single-digit scores (1-9), the stanine scale is 
more suitable for computer card records than 
the C scale (scores 0-10). 


Limitations : (a) The condensation in the 
stanine scale results in a grouping error, (h) 
Merger of the highest and the lowest 


categories, corresponding to the C scores of 10 
and 0, with the respective neighbouring 
categories deprives the stanine scale of the 
power to discriminate between the highest 1% 
or the lowest 1% of cases and those 
immediately adjacent. This affects guidance 
work, (c) The grouping in the stanine scale 
lacks fineness and precision ; so, the 
transformed scores form only approximately 
normal distribution. Thus, z scores should 
suffice as the standard scores for a near-normal 
distribution of raw test scores and stanines need 
not be computed there, (d) Like other 
normalized standard scores, stanines should be 
used only when the non-normality of the raw 
score distribution has resulted from defects in 
the lest and is not a characteristic of the 
variable in the population. 

Deviation IQ 

Deviation IQ is another form of normalized 
standard score. It was first used in Wcchslcr 
Intelligence Scale (1939). Unlike the ratio IQ, 
deviation IQ suffers no agewise fluctuation 
because its SD does not change with age. The 
method consists of the computation of a 
weighted IQ from each raw score and the 
assignment of a mean of 100 and an SD of 15. 
The test users can interpret the test scores 
almost like the Stanford-Binet ratio IQ. The 
distributions of deviation IQ (Fig. 10.2) and of 


Table 10.9. The nine-point stanine scale. 



Scanned by CamScanner 



f 


278 


STAT.ST.CS m BIOLOGY AND PSYCHOLOGY 


Tabic ,0.0. A nnins sumne score .o raw ■e s^sc^sofas an.p.e of 25 pe^op, - 

E^e-uawsc g 


37 






the 1937 revision of Stanford-Binet ratio IQ 
possess an identical mean and similar SDs 
(nearly 16). Deviation IQ has also been used in 
the L-M form of the 1960 revision of Stanford- 
Binet Scale, having a mean of 100 and an SD 
of 16. The transformation into deviation IQ 
gn-es the IQ a relative significance identical for 
aS ages, and thereby makes the interpretation 
easier and less erroneous. Peviation IQ is 


particularly applicable to intellectually mature 
persons older than 16 years. 

The fourth revision of Stanford-Binet Scale 
by Thorndike et al in 1986 used the composite 
Standard Age Scores (SAS) having a mean of 
100 and an SD of 16. They are also normalized 
standard scores expressed in the same unit as 
the deviation IQ of 1960. 


GLOSSARY 


* S'ciz* 8 ' Scale of uuem8ence ° r its sub —* 


correlating their scores on administering ^ 7016 ^^ ^ t6StS ’ that is ° btained b 

i mm ed i a te ly after one another, and serves as an esrimnt 6 f gr ° I Up , of sub J ects either simultaneously ( 
coefficient of internal consistency : a mea r e o re a ility by the alternate-forms metho< 

tests into which a test has teen split for ° f , contents and consistency of two hall 

coefficient of stability and equivalence • ^ ^ “ 

d^endab,lity of measurement over a timTintemf thar eq “ lvalence of contents of two tests an. 
equivalent or parallel forms of a test administered to th by COrrelatin g ** scores of tw. 

mes and serves as an estimate of reliability by the altf TT & ° Up ° f Subjects at two differen 
concurrent validity : suitabihty of a test in * lternate -forms method. 

SS TSX *-» - -S5.1rs «*.—. ■ 

n nis/her already achieved grade o 


Scanned by CamScanner 












PSYC'FOLOOICAL TEST CONSTRUCTION 


279 



>ons. 



53 

45, 

41 


48 


ually 


niatu 


Binet Scale 
~ COm Posite 
a niean of 
normalized 

me unii as 


ons, which 

>ns of raw 
> of 2. 

:ained by 
eously or 
: method. 

two half- 

ests and 
5 of t^o 
jjfferent 

;l ro an 

rade or 


4* 


d »» 


lidiV : suitability of a test in evaluating the status of a subject in a given abstract behavioral 


*** . ..jy ; relevance of both the individual test items and the total test contents of a test to the given 
^ domain, and suitability of the test in evaluating the status of a subject in that behavioral 

validly : suitability of a test in measuring a trait, as given by a significant positive correlation 
'fUtct test sc ores with the scores in other tests known to measure the same trait satisfactorily. 

either a dependent variable whose likely scores are sought to be predicted from the scores of a 
(f neri° n : ^ a ^pendent variable whose anticipated changes on exposure to a given independent variable 
^. e explored in an experiment. 

variable : the behavioral response which is either studied in an experiment on exposure of a 
jtftw en ^ su bjects to different levels of an independent variable, or predicted from the scores of an 

independent variable in a test. 

. |Q . normalized standard scores which are weighted IQ scores, and form a distribution having a 
^ mean of 100 and an SD of either 15 (Wcch'lcr scale) or 16 (L-M form of Stanford-Binct scale). 

. u Ky value : the level of difficulty in answering a test item, estimated by the proportion of right 
answers to it by the subjects of a sample in the relevant test. 

dominant validit) <>f a given test in measuring • trait. > .hotiyn bj negligible 01 0 

correlations of its scores with the scores of the same group of subjects in tests for dissimilar traits. 

discriminatory value : capacity of a test item to discriminate between the subjects with respect to their 
answers to that item. 

error of measurement, rundom : errors fr<»m such uncontrolled factors as erroneous scoring and 
interpretation, and affecting the reliability of a test. 

error of measurement, standard : a measure of the random errors of measurement, estimated as the square 
root of the error variance of the test scores. 

error of measurement, systematic : either overcstimation or underestimation of the test scores of all the 
subjects systematically due to factors like defective test construction, and affecting the validity of the 
test. 

expectancy table : table constructed from scatter diagram or correlation table for expressing the relation 
between tfie scores of a given test and the scores of a criterion. 

extraneous variable : any variable which, though neither the dependent nor the independent variable in an 
experiment, still occurs in the experimental situation, in the experimental procedure or in the subjects 
used in the experiment. 

fector analysis : statistical analysis for identifying and assessing the factors determining the psychological 
0r physical characteristics of subjects in a test, for correlating the factors individually to a criterion, 
und for exploring the construct validity of the test. 

“■dependent variable : the variable deliberately chosen and used by the investigator either to study its effect 
on the dependent variable, or to predict the likely value of the criterion in a test. 

Gening variable : any unobserved and hypothetical variable, presumably interposed between independent 
and dependent variables of an experiment, and affecting the dependent variable. 

re fevant variable : any variable that happens to occur in the subjects, situation or procedure of an 
experiment or test, but is neither used as the independent variable nor affects the dependent variable 

0r criterion. 


re 




Scanned by CamScanner 










280 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


item analysis : statistical analysis of the test items of a psychological test to determine their difficulty J 
and discriminatory powers. 

Kuder-Richardson formula 20 : mathematical formula for working out the average item intercorrelatkJ 
a test as an estimate of the average split-half correlations from all possible splittings of the t c ° n 111 

. logical construct : a hypothetical quality or attribute such as intelligence, emotional stability and drivel 
may explain unobserved events intervening between an independent variable (stimulus) 
behavioral response. ln ^ a 

norm . representative value for a group of subjects in the scale of the transformed scores of a test 

organismic variable : such a physical or psychological characteristic of the subjects in a psycholo • 1 
experiment as may act as either an independent or an extraneous variable. 

percentile scale : an ordinal scale formed by nonlinear transformation of raw test scores to percentile 

power test : a test given enough time to enable not less than 75 % of the subjects to attempt all its ;■ 
which are arranged in an ascending order of difficulty. cnis 

predictive validity ; suitability of a test in predicting a specific future performance, as given bsfl 

performance,^ elWCen “ S1 ^ a " d "" “ 0res 0f a that subset® 

■2££r« , s: —TSrjcsaxssr - - * “ ™i_ 

: —- -. 

" reSUUS ° f * ,eS ‘ “ “ » on the same sample under idej 

— » scores of a test, us 

speed test : a test having test items of uniform difficulty fevri. r „ • • , 

within such a stipulated time as is too shot, for anyone 'to an^S^i^"" '° ,he “ 

correlating the 777 0 °/ 'hTtwo "hahef SpUt halves of a “blamed by 

simultaneously or in immediate succession. ° 1 e same ® rou P subjects either 

standard scores : either linearly or nonlinearly transformed raw scores such as z T C and , Jl 
stamne : normalized slandard scores worked out kv ,k ,• ^ T ’ C and s,an,ne SC H 

forming a nine-point scale ranging from 1 to otitTa '^ orraa,ion of mw lest scores, and 
stimulus variable : any physical or social, „ ° f 5 ° " d “ SD ° f *•*> | 

of a test or experiment. * bUmulates a speciflc behavioral response in the subject , 

^ forming a^Sb^on rat^m “ 7 ^^ DOnUnear of raw tes, see 

test-retest reliability , raeasure of J h , ‘° 100 W,Ul a “» »f 50 and an SD of 10. 

applied twice on the same sample 7 th 77 iLbfc 7 two of scores of the same tes| 

validity : capacity of a m,t , S “ ,ab ' e mteTOn mg mterval. 

test and those ^^xmmaTcrtenor ** ** COmU ' ion «**“**« between the scores of the gt 


‘for 

- 


Scanned by CamScanner 



11. ANALYSIS 

An experiment is designed to study the 
ffects of one or more independent variables 
^ a single dependent variable (pages 4-6) ; 
c .g., the effect of admin istered doses of a 
hypoglycemie agent (independent variable) on 
the blood sugar level (dependent variable) ; the 
effect of a practice schedule (independent 
variable) on the performance in a particular 
psychological test (dependent variable) ; the 
effect of doses of a pesticide (independent 
variable) on the tracheal ventilation (dependent 
variable) of locusts; the effect of iodoacctamidc 
(independent variable) on malt fermentation 
(dependent variable) by yeast. 

Analysis of variance is used to find whether 
or not the exposure of the sample to the 
independent variable has enhanced the variance 
of the dependent variable significantly above its 
variance due to random factors. 

11.1 EXPERIMENTAL DESIGN 

Experimental design is the scientific 
planning of an experiment for exploring the 
effect of one or more chosen independent 
variables on a specific dependent variable . It 
includes the following procedures. 

1. Opting for uncontrolled or controlled 
experiments : 

In uncontrolled investigations , the sample 
has already beerf exposed to uncountrolled 
classification variables such as sex, age, race, 
habitat, genetic error, atmospheric factors, etc., 
as independent variables. The investigator 
cannot “fix” or control (i) the levels (i.e., 
amounts, amplitudes, intensities, doses, 
frequencies, etc.) of such independent variables, 
00 the mechanism of their application, and (in') 
rhe instants and the durations of application; 
0 V ) in some cases, the investigator cannot 


OF VARIANCE 

apply the independent variable (e.g., a 
mutagen) directly on human subjects for 
studying its effects, and has to depend on 
human cases already exposed accidentally or in 
natural courses, or long before the experiment 
has been planned. Such independent variables 
are uncontrolled classification variables (page 
5), free to vary at random . Use of such 
independent variables in an investigation leads 
to considerable random errors and ambiguous 
experimental observations, (v) Because of 
uncontrolled application of independent 
variables, the relevant variables (page 6) 
affecting the dependent variable cannot be 
effectively controlled or eliminated either; this 
adds further to the experimental errors, (vi) 
Uncontrolled experiments have often to be 
conducted by retrospective methods (sec below) 
which arc weaker, less precise and more prone 
to error* than prospective methods. 

A few* examples of uncontrolled experiments 
arc : studying effects of adolescent tobacco 
smoking habits on adult emphysema or 
pulmonary carcinoma, of atmospheric sulfur 
dioxide on tracheal ventilation of beetles, of 
childhood measles infections on adult hepatic 
enzyme profile, of genetic mutations on 
metabolic pathways, of sex on serum LDL- 
cholesterol, or of atmospheric mercury vapours 
on pulmonary ventitation. 

In controlled investigations. An contrast, 
independent variables are “fixed” treatment 
variables (page 5) whose levels, methods, 
modes and timings of application are rigorously 
determined and controlled by the investigator 
so as to eliminate random errors as far as 
possible. Moreover, because of controlled 
application, many of the relevant variables 
affecting the dependent variable may also be 
effectively contained or eliminated, thus 



Scanned by CamScanner 





282 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


/ 

minimizing experimental errors. These enhance pulmonary emphysema /.. 
the precision of the experiment and lessen inherited diseases such ^ t(lrc c\- 
reasonably the ambiguities and errors in the albinism and Down sym] r ^ ° c Ulo c ^ 
inferences to be drawn. A few examples classification type of ' ? niC ’ ^*0 r 
include the studies of effects of chosen doses beyond the control 0 f ? C ^ en( l e nt 
of cffloxacin on intestinal bacterial flora suffering from random 1 ^ ,nVc sti £ ^ ri5 C 
profile, of selective prefrontal lobe ablations on 0 f such independent S*?' ari(1 (iv\ *Hd 
the learning ability of anthropoid apes, of pre- applied on human nC ! b,Cs as c fl 'H 
determined levels of a barbinirate on respiration mutagens! R m ' * ,n Ples dir CcM ^ 0, i. 

of mitochondrial preparations, or of specific fo “C '' ^ * <'• 

subtotal pancreatectomies on liver lipids, k e ' n'lm a °™' n « S : <0 Pas. 7 s S’ 
Generally, the more efficiemanddcpendible may^otT VagUC; S 

prospective methods are used in investigating of L cn f unb,asc d. nor truly ren P . Cs 
such controlled experiments. ° f Ae con wpond ing populations-r Sen % 

melods° OSin8 retWSpeCtive or Prospective pa Jnts of a°h!^ 

uncomrolledtx '"-' W “ frequcntl >’ uscd for pro >’ or ' io " ^ "the ZTuf “ bov ^ 
variablei h CX P enmcnts usin fi classification ^ relevant variables nfTccti* r>o P l >Iati 0 n- 

(page 5) Here the C ° ntr0l .° r ! l,c inv «''P'or rariabl « »'av escape from being ' W ^'"i 

« ’ime to cxpio'rfrisihle"'^ '" => IC ‘ ld * W » h 

-E i y. ^* 

dependent variable, TeemLt /" " "" VMabl « <p£ 5 TXT ™ indCpendt " 

sucb - chron“:tr d - Variab ^ J -^ f ° r 

drug addiction, and prenatal^’ f alC °. ho1 0r in dependent vaT^M 1 ^? 111 ^ ° f applying ^ 
.control group is al s 0 drawn with -^f^ion. A himself Here th ^ ® by the investi g alor 
' ases not showing those ^ lndlv,d “nls or forward in >• ’ ' nvesti gation is conlimi 
'pendent variable. The pas*! v" SeS ln tbe effects of tlT *^ t0 cxplore Ibe sullsei l 0( “ 
record of exposure to the blstor y or the applied lnde P e ndent variable being 

ent variable is then collected f U i - le< *' ndepen ‘ himself p on a sample by the investigator 

b0 > b groups and com p la i ndivid ^ of ~;. From a random sample, the investi- 
■"ferences. For exampT, 0 fit/ 0 / draw tng conTro/ f ind ' vidual ^ at random to the 
standing (independent variable. d prolo “ged individ,, i and ex P e r<mental groups. Be 
nf venous varicosity a cause ^ ^ ? f ,he experimental groups) - 

investigate 01 '’ 1 " 8 pr “longed smnrt^' PaSt chose n levels ’"f 3 S ‘ nCUy regulated n,anner .!° 
varicose an “perimen.,. " g are under hi. ° f a “ fixed ” treatment variable 

nonvaricn Pa “ e "' S and a control 8t ° Up of groun ^ C ° ntr ° l; individuals of the conwl 
useful in m PerS ° lls: ' Re trospecti V e 8r ° up of indenen/ eXcIuded from the application of df 

as the er. U8aUng ® 'Cel math ° dS ^ tS Vadable - ^ *9«** «** 1 

cls of Prolonged such and c ! m ! V “'asured in both experimeotri 

ok,n « on gro lro1 Sroups, and the findings of** 

6 0m Par’ed. For example, an e*p fn 


Scanned by CamScanner 


ANALYSIS OF VARIANCE 


283 


1 aroup of ^ino rats may be injected 
14 j h e chosen level of thiouracil over a 
w jjiJe similar rats of a control group are 
t &°° i nlacebo containing nnlv thA 


^ ted with placebo containing only the 
i^ nt for the injectule with no drug; subse- 

s 0 l 'ntly» the blood T * activit y (dependent 

4 . bje) is estimated in both the groups and 
spared for inferring whether thiouracil has 
C j°gnificantly changed the blood T 4 activity. 

3 . Pilot experiment : 

A pilot experiment using small groups 
should be carried out before further planning 

* i r ii__ i __ m _• • • • 


applied on the individuals of one of die groups 
while all the individuals of a group should be 
exposed to the same level of the factor. 


To minimize experimental errors, each level 
of the factor (independent variable) should be 
applied on more than one individual, consti¬ 
tuting a group. This is known as die replication 
of each level of the factor. The size (n) of each 
group, consisting of a number of individuals 
chosen at random from the sample and exposed 
to a given level of the factor, is determined by 

U) u v* ____— r—.......£ the desired number of replicadons of that level. 

for the actual full-scale extensive invesdgation. For example, each of three groups should 
It serves the following purposes; (0 it indicates consist of 10 individuals (n = 10) chosen at 
the possibility of the anticipated effect of random from the sample, if ten replications of 
independent variable in the full-scale investi- each of three levels of a factor are intended in 
gation to follow', (ii) it helps in deciding the an experiment; for this, a sample of 3 x 10 or 
levels of die independent variable to be applied 30 cases (/» x k) is to be drawn initially from 
on the sample and (Hi) the number of die population. 
replications of each level in the full-scale .. 

experiment lo follow, (iv) il is uscJ in Meriting 5 ~ Smgl'-factor and factorial experiments : 
out the sample size for the full-scale experi- A single-factor or single-classification 
ment, and (v) it helps in planning for climina- experiment is undertaken to explore changes in 
ting or minimizing the effects of numerous the dependent variable due to the exposure of 
relevant variables likely to affect the dependent die groups from a sample to the levels of a 
variable, and limits thereby the experimental single independent variable. Such an experi- 
errors. ment may be designed in several ways as 

4. Levels and replications : follows. 

Levels consist of the chosen amounts, ( fl ) Independent group experiment : For this, 
intensities, amplitudes, categories, doses, etc., ench group is constituted by a separate set oj 
of the independent variable (factor) to be individuals, chosen at random from the sample 
applied on the sample in the experiment. The drawn initially from the population depending 
number (k) of groups of individuals, used in an on die laws of probability. None of the groups 
Wperiment, is determined by the number of includes any individual, common to or 
levels of the factor to be applied. For example, associated with the other groups. The groups 
toe individuals of a random sample from the may be of identical or different sizes; large 
Population would be randomly allocated to independent groups would consist of not less 
toree groups (k= 3) where three levels like 0, 5 than 30 cases each (n > 30) while small 
^ 10 micrograms of a hypoglycemic factor independent groups would have group sizes 
ln e P e ndent variable) are planned to be lower than 30 (n < 30). Each such independent 
'for miSterec * t0 d*e subjects in different groups group would be exposed to a specific level of 
^dying the effect of that factor on the the independent variable so that there should be 


Jtood sugar (dependent variable) in an 
Periment. Each level of the factor would be 


as many groups as the number of intended 
levels of the latter, while the group size would 



Scanned by CamScanner 





2S4 


STATISTICS 


IN UIOI.OGY ANP P$YCII01.0<1\ 


depend upon the number of replications ol the 
level to be applied on that group. One of these 
groups, called the control gn*up % would be 
exposed to a level of treatment that is tree 
from the independent variable; the others, 
called the experimental groups, would receive 
different other levels of treatment consisting of 
respective specific doses or amounts of the 
independent variable. As the groups consist of 
separate sets of independent individuals, the 
dependent variable scores of different groups 
would bear no correlation with each other 
(pages 124-125). 

(/>) Single-group experiment : In such an 
experiment, the same group of individuals, 
randomly sampled from a population, serves 
first as the control group and subsequently as 
the experimental grxrup(s); each time, the same 
group would be exposed to a separate specific 
level of the independent variable, after which 
the dependent variable scores would be 
measured in its individual eases (pages 132- 
133). Being constituted by the same single 
group of individuals, the control group as well 
as each experimental group has the same size. 
A large single group would consist of 30 or 
more individuals (n>30) while a small single 
group would have less than 30 eases (n < 30). 
Because the same group of individuals is used 
for successive levels of exposure to the in¬ 
dependent variable, each individual would yield 
a pair of dependent variable scores after every 
pair of consecutive treatment levels; the two 
scores of each such pair would bear a relation 
to one another. Thus, the dependent variable 
scores after successive treatment levels would 
constitute paired and correlated observ ations. 
For example, the kneejerk reflex strengths may 
be initially measured in a group of athletes 
after injecting them with placebo free from 
adrenaline (control group); the same group is 
next injected with a dose of adrenaline and the 
kneejerk strength is measured again ( experi¬ 
mental group). These two sets of kneejerk 
scores form paired and correlated observations. 


(c) Matched-pair equivalent group exp rrit 
merit : For this, either the intended dependent 
variable, or some other variable considered a* 
related to the latter, is measured initially j n aj | 
individuals of a random sample. Next, t\ Vo 
equivalent or matched-pair groups a rc 
constituted by including in one group such 
individuals, each of whom is matched with an 
individual of the other group with respect to 
the initially measured variable (page 139). The 
two matched groups are then treated with two 
different lewis of the independent variable; one 
of these groups may thus he treated with 
placebo (level I) to serve as the corrtrol group 
while the other may be treated with a given 
dose of the independent variable (level 2) to 
serve as the experimental group. 'Hie dependent 
variable is subsequently measured in the 
individuals of both groups; these two sets of 
final scores of dependent variable constitute 
paired and correlated observations. Matched- 
pair groups arc identical in size and may he 
large (/i 2 30) or small (/i < 30). 

(</) Matched-mean equivalent group 
experiment : Here, individuals arc so allocated 
from sample to groups that group means of 
initially measured scores arc equal, but indivi¬ 
dual scores of two groups may not be matched 
in pairs. 

(e) Randomized block experiment : An 
experiment may be designed this way if the 
sample has been drawn at random from a 
population which has r number of classes, 
strata or categories bearing varied relations with 
the dependent variable to be studied. For this, 
the entire sample, consisting of n number of 
individuals, is divided into r number of blocks. 
Each block consists of k number of individuals 
belonging to a specific one of the classes or 
strata -fc is also the number of levels of the 
independent variable planned to be applied. 
Thus, n equals rxk. The individuals of each 
block are allocated at random to different levels 
of treatment so that (i) each level of treatment 
is applied on only one individual of each 


Scanned by CamScanner 


r 


ANALYSIS Of- VARIANCE 


285 


an d («) aJl the levels arc applied on one 
of the members of each block. So, r is 
of^^ber of replications of each of the k 
& nUl 0 f levels. After exposure of the blocks 
"^Tthc levels of treatment, the dependent 
|0 3 hie is finally measured in all the indivi- 
v3r ' a of each block. The dependent variable 
^ s of such individuals, as have been 
$c ° r gC( j t0 the same level of treatment, are 
f ^°arranged in a group — this is repeated for 
tnhc levels of treatment used. The scores of 
Afferent groups are then compared and 
analyzed statistically. 

A factorial experiment , in contrast to the 
single-factor experiment, is designed to study 
the effect of combinations of different chosen 
levels of more than one independent variable 
(factor) on a given dependent variable. It may 
k , iy, three-way r-way classi¬ 
fication experiment according to the number of 
factors applied. For example, in a two-way 
experiment to atudy the > of cold 

exposure and thyrotropin administration on the 

thyroi ; 111,0 

groups as the number oj < 
of the two independent variables and each 
group is made of as many individuals as ihc 
number of replications of the rcle\ani 
combination. Tliereafter, the individuals ol each 
group are treated with a specific combination 
of levels of the independent variables. 


6. Sample size : 

The size of the sample should be worked 
out statistically before drawing a sample for the 
experiment. The sample should be sufficient!) 
large to ensure that (i) the sample is truly 
representative of the population and includes 
individuals of different types or categories in 
toe same proportions in which they occur in 
toe population — the smaller the sample, the 
higher is the probability of non-inclusion of 
r^e cases into it, thus making the sample less 
re Presentative of the population, (ii) the 
distribution of scores of the dependent variable 


in the sample conforms to either normal or t 
distributions so as to enable the use of 
parametric methods for statistical inferences 
from the experimental data, (ifl) the SD of 
those scores does not suffer from downward 
bias due to the non-inclusion of rare >*u>cs 
from the population in the sample, (iv) a large 
sample size narrows the Hq and H a distribu¬ 
tions to decrease the area of their overlap, and 
consequently lowers the probability of type U 
error (p) of inference (page 114) without 
enhancing the probability of type I error (a), 
(v) it thereby increases the power of a test in 
detecting genuine differences as its power is 
given by (1 -p), and (vi) it decreases the errors 
due to planned exclusion of the rest of the 
population. 

The sample size (n) is frequently estimated, 
using the unbiased SD (s) from the pilot 
experiment as an estimate for the population 
SD (0). the critical deviation (D) which is the 
specified difference between the means (p 0 and 
(pi 0 ) of respectively the H 0 and H a distribu¬ 
tion and a chosen level (a) of significance. 
For a one-tail test for difference between two 
means, cither the critical t score with ~ 
degrees of freedom and for the chosen one-tail 
a, or the critical z score ( z a ) for that a, may 
be used for estimating n. 

2 » . 2.2 

5 z" ai—) 

D = p a -Pe> \ n = ~fT' OT n ~ 3 ' 


For a two-tail test, either the critical z^ or 
the critical t at is used, beyond which lies 
the fractional" area (a/2) in each tail of 
respectively the normal and the t distributions. 


1 2 
S~Z 


n = 


all 


D 2 


or n = 


sl alH° 


7. Randomization : 

Randomization should be ensured both in 
sampling and in treatment with the independent 
variable. 







Scanned by CamScanner 











286 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


A. Random sampling : 

To make a sample fairly representative of 
the population, pivbability sampling is used to 
choose at random the required number of 
individuals from a population for inclusion in 
the sample. This depends on (/) laws of 
probability and (it) frequencies of different 
types of individuals in the population, and (iii) 
leaves no scope for arbitrary choice by any 
person. 

(a) Simple random sampling : This is the 
random choice of individuals for a sample from 
the undivided whole of a small, finite and 
homogeneous population, depending on laws of 
probability. It ensures that (t) no element of 
conscious or unconscious bias, whim or 
personal factor of the investigator affects the 
choice, (it) each member of the population has 
an equal probability of being chosen for the 
sample, (Hi) the choice of any member is 
independent of the choice or exclusion of any 
other member, and (iv) proportions of different 
types of individuals in the sample conforms to 
their respective proportions in the population. 

In the simple card drawal method, all members 
of the population are ,given serial numbers 
which are entered on separate cards; after 
mixing the cards together, as many of them are 
picked up at random as the required sample 
size, and the corresponding individuals of those 
chosen cards are included in the sample. The 
more scientific random number method is 
based on .he choices from digits of a random 
number table of such digits arranged in 

(pTetTsiT * ran r° m nUmber genera,or 

pages 8-9). For sampling with replacement a 
member once chosen for the sample continues 
. be c0 ”*‘<iered for all subsequent choices 

eveVind 5 “'.‘“7 ^ pr ° babil ">' «f choice of 

y 'dividual unchanged and identical at 
any step of choice; b„, i„ samp,ing till, 

Z aTf “I' individUal ° ncc nbosen is ken 
out of aH subsequent choices so that the 

probability of choice rises, though by small 


degrees, from choice to choice instead 
remaining unchanged — this needs a rnodj 
fication of formulae used subsequently j n 
computing standard errors from such a sanip] e 
(pages 74-75). 

( b ) Stratified random sampling ; T his 
method is used for a large and heterogeneous 
population with distinct strata. Here, simple 
random sampling is applied separately on each 
stratum, drawing as many individuals from the 
latter as would correspond to the proportional 
size of that stratum in the population; the 
proportions of individuals in different strata of 
the sample would thus conform to those in the 
respective strata of the population. Moreover, 
the probability of getting chosen remains 
identical for all members of a stratum, but 
varies from stratum to stratum according to the 
proportional sizes of strata in the population. 

(c) Multistage sampling : This method is 

used for a vast and widespread population. 

Utilizing some existing steps in the population, 

it is divided into a number of successive stages 

ranging from the entire population to the 

individuals. Simple random sampling is applied 

at each stage separately and successively (page 

). For example, a few first-stage units are first 

ran omly chosen from the population, then a 

' second stage units are chosen at random 

mimk C C j? 0sen fi rst ~stage units, and a specific 

from ^ ° ' ndividuals ma y "ext be chosen 

third t ,' e Ch ° Sen second -stage units, as the 
'hird-stage units , 0 form the sf|mp|e 

me^mof a Mpuh, Mmp/ 7 : Some times, all 
are arm*, i P P . atlon ^ive, or turn out, or 

such as the^K Ser £P y in a systematic order, 

fishes angled'seria'lly 7“^ fr ° m 

Datienic „ nalIy from a water-body, or 

cfi"ic.,,;r“r y at ** out - pa,ient 

sample is chosen n ’ > h flFSt member for the 

followed by the ch^^ Ut land ° m ^ iS ^ 

°rce of other members at 


Scanned by CamScanner 


ANALYSIS <)|' VARIANCE 


287 


, y predetermined regular intorvalH along 9. Statistical treatments : 

r« 1,a .!,.| ordor ol thoir turn-outs. „ ....... 

il )C senw* I lie data obtained in any experiment or 

li Random treatment : • investigation have eventually to be subjected to 

^Vliercvcf possible, different levels of the H *otistical tests for analysis, interpretation, 

(refitment variable should be applied in inference and prediction. Because the appli- 

^,, cs varying at random from individual to ca, ‘ on °f cac ^ statistical test depends on the 

fpviduol of the sample. Tills would prevent or justiilability of its specific assumptions in the 

"I nnate the order effect of the treatment, P ert >nent investigation, the latter should be so 

* liicli may result if the levels of treatment arc designed as to conform to the assumptions and 

died in the same sequence to all the justify the application of a powerful and para- 

indiv 


g. Minimizing effects of relevant variables 

In any experimental system, there always 
occur many relevant variables which are not 


metric test to the obtained data. Where the 
significance of difference between group means 
would be sought after collecting the data, 
Wilcoxon’s composite rank test. Student’s t test 
. and analysis of variance (anova) — three 

intended to be applied in that experiment, but possible alternatives — possess progressively 
nevertheless may in uence oi allcct the higher powers and dependabilities; so, the 

dependent va,,a e 0 1 1C experiment or experimental design should endeavour to 

otherwise sway the experimental result (page conform t0 lhe assumptions for anova and t0 

6 ). Effects of all such relevant variables that enable , hereby *«, use of * at tes , in preference 
may interfere with the aimed investigation need t0 the tw0 preceding tests . lf bodl a parame tric 

to be counterc or minimized by proper t eS ( and j ts nonparame tric alternative are 
designing of the experiment and its proceed- available the fonner should be preferable ^ it 
ings. Thus, subject-relevant variables, such as u more ful the latter so long as ^ 
quanutauve or qualitative variations of age, sex, assumptions for ^ former ^ justifiable; s0 , 
race, strain body weight attitude, aptitude, ^ experiment should ^ s0 desi cd ^ ^ 
genotype and phenotype of subjects chosen for maki , he assumplions of the parame ,ric 


the sample, may affect the experimental 
outcome, though not so desired, and should 
therefore be minimized by properly designed 
random probability sampling. Situational 
relevant variables, such as unintended 
variations of pH, temperature, ionic 
concentrations, osmolarity, noise decibels, light 
intensities, workshop ventilation, etc., in the 
experimental medium and system or in the 
surroundings, should be controlled and 
sla bilzed — by using buffers, thermostats, 
Vo * ta 8 e stabilizer, etc., for instance — so as not 


test justifiable — the nonparametric test should 
be taken recourse to, only if the assumptions 
for its parametric alternative are not justifiable. 
This explains why the product-moment 
correlation (r) is preferable to Spearman’s or 
Kendall’s rank-dependent correlations, so long 
as the assumptions of r are justifiable; 
moreover, a significant r, in contrast to the 
other two, leads to further explorations by 
parametric partial and multiple linear 
correlations into associations involving more 
than the two originally explored variables 


0 allect the investigation. Sequence-relevant io \&i\ 

fables should be eliminated by randomizing P a k es 

lhe sequence of applications of different levels While designing an experiment, the 
01 ex perimental treatments from subject to qualitative and quantitative laboratory tests 
Sl *^jcct in the sample, to avoid the order effect should also be chosen on the basis of their 
01 a Pplications (see above). reliabilities and validities, as measured statis- 



Scanned by CamScanner 





288 


STATISTICS IN HIOLOOY AND PSYCHOLOGY 


tically. An experiment or test has a higher 
reliability if it measures or assesses I he 
intended variable more consistently with lesser 
errors, and possesses a higher validity if it can 
measure that variable in exclusion of most or 
all of the other closely related variables (pages 
257 and 263). 

11.2 ANALYSIS OF VARIANCE 

Analysis of variance {anova) is an extension 
and generalization of Student’s t test, but (i) is 
preferable to the latter because anova is far 
more powerful, (ii) can be applied to two or 
more groups simultaneously, (iii) can be used 
in estimating the strength of association 
between the dependent variable and the 
independent variable, and (iv) also helps in 
minimizing experimental errors because 
experiments have to be designed more 
rigorously to conform to its assumptions. 

It tests the difference between the variances 
of two or more groups. For example, in the 
follow-up of an experiment using a single 
independent variable, the anova (one-way 
anova) analyzes different components of the 
total variance (sj) of the sample to estimate the 
relative magnitudes of the within-groups 
variance (s£) due to uncontrolled random 
factors, and the between-groups variance {s%) 
which may have been influenced by the applied 
independent variable — it aims at finding out 
whether or not s% can be explained away by 
the null hypothesis that it does not differ 
significantly from the sj. The basic anova 
statistic is the variance ratio or F ratio 
(§ 11.3). 

Classification of anova 

The method of anova differs according to 
the number of independent variable (j) used in 
the experiment. 

A single-classification or one-way anova is 
used to investigate the effects of a single 


independent variable on the dependent variab) 
'flic number of applied levels of 
independent variable determines the number o°f 
groups in the experiment. The size of eac ^ 
group equals the number of replications of th e 
given level of the independent variable. p 0r 
example, a one-way anova may be appii e< j to 
tracheal ventilation values (dependent variable) 
measured in three groups of 20 insects each 
after their exposure to three respective doses of 
a pesticide (independent variable), to find if th e 
pesticide changes the tracheal ventilation 
significantly. 

Higher orders of anova such as two-way 
and three-way anovas are used in a factorial 
experiment where the simultaneous effects of 
more than one independent variable are being 
investigated (page 285). The number of groups 
used corresponds to the chosen number of 
combinations of different levels of the 
independent variables, each such combination 
being applied on one of the groups. The size of 
each group corresponds to the desired number 
of replications of each combination of the 
independent variables. 

Models of anova 

Different models of anova have to be used 
according to the nature(s) of independent 
variable ( 5 ) in the experiment. 

(a) Fixed model or model 1 anova : 

This model is used in exploring “fixed” or 
controlled treatment effects. In other words, it 
analyzes the variances of a dependent variable 
in experiments using “fixed” experimental 
treatments) as independent variable(s). A 
model I anova is thus used for studying the 
effects of chosen and controlled levels of drugs, 
hormones, ions, radiations, experimental lesions 
or ablations of brain, temperature, pH, light, 
osmolality, etc., on physical properties, 
chemical constituents, structural components, 
activities, functional aspects and behaviours of 
organisms. It is also used in studying the 


Scanned by CamScanner 


analysis of variance 


289 




, 0 f practice, learning methods, etc., on 
Nuance. A one-way model 1 anova, for 
< r,0 ' )le< explores whether or not the belween- 
variance {s;) contains an added 
rtf component , owing to the exposure of 
pf ' !, , rvlU ps of subjects to different levels of a 
S ,|e treatment variable, distinct from and in 
Jjition to the variances associated with 
jndoni sampling (§ 11.3). In model I anova, 
^ independent variables, strictly controlled by 
he investigator, do not suffer from random 
hanges. So, a model I anova, showing 
jenificant results, may be followed by the 
.$timation of the strength of association 
^tween the dependent and the independent 
variables by working out omega square (§ 
11.3). 


(I?) Random model or model II anova : 

This model is used in exploring the effects 
of chosen random factors on the dependent 
variable. In other words, it analyzes the 
variances of a dependent variable in 
experiments where the groups have been 
exposed to independent variable(s) such as sex, 
race, age, genotypes, habitats, home 
en\ironments, atmospheric temperature, 
atmospheric pollutants and cosmic rays, which 
are randomly changing classification variables , 
largely beyond the control of the investigator. 

us * ^ ma y explore the differences between 
Scrips drawn from different breeds, genotypes, 
Phenotypes, natural habitats, races, socio- 
emomic status and home environments. In a 
•° e Wa y mo del II anova, for example, it is 
*■*■* whether or not the between-groups 
Com an - contains an added variance 

Present^ a ^ sent w^in the groups, but 
ex Posu ClWeen the latler subsequent to their 
model H l ° lhC inde P endenl variable(s). In 
Massif, anova ’ l ^ e uncontrolled independent 
^ctuatiQ 31100 ^ var * a ^ es suller from randont 
anova ; c nS ’ S °’ a significant result in model II 
s not lollowed by working out omega 

37 


square for the strength of association between 
the dependent and independent variables. 

(c) Mixed model or model HI anova : 

Unlike models I and II which may be either 
one-way or of higher orders, model III is 
always a two-way or a still higher order of 
anova where some independent variable(s) must 
be “fixed” experimental treatment(s) while the 
other(s) must be uncontrolled classification 
variable(s). The effects of two or more levels 
of a “fixed” treatment variable may also be 
explored by model III anova in a single-group 
experiment , because the randomly chosen 
subjects or cases of the group constitute a 
classification variable affecting the dependent 
variable in addition to the treatment variable 
used (§ 11.6). 

Assumptions of anova 

(a) Random assignment : 

(0 The experimental design should provide 
for random sampling so that each individual of 
the population has an equal probability of 
being chosen for a group, and the choice of 
each individual is independent of the choice of 
others. 

(») Randomization of treatment should also 
be ensured for different levels of the 
independent variable(s), wherever possible. For 
this, all individuals of the sample should be 
allocated at random to different sequences of 
application of the chosen levels of the 
independent variable. The dependent variable is 
measured in each individual, following the 
treatment with each level of the independent 
variable. After all the levels have been applied 
on the individuals in sequences varying at 
random from individual to individual, the 
measured scores of the dependent variable are 
arranged into groups according to the 
respective applied levels of the independent 
variable. The scores of different groups, thus 
constituted, are then subjected to anova. Such 


Scanned by CamScanner 




290 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


randomized treatment eliminates the order 
effect which may result if different levels are 
applied in the same sequence to all the 
individuals. 

(b) Normal distribution : 

The dependent variable should have a 
normal distribution in the population. Stated 
otherwise, it should be reasonable to assume 
that the error terms, i.e., the deviations of 
individual scores from the respective group 
means, are distributed normally. 

(c) Independence of errors : 

The error terms, i.e., the deviations of 
individual scores from the group mean, should 
be independent of each other. This is an 
alternative form of the assumption that the 
individual scores occur at random and 
independent of each other. 

(d) Homoscedasticity : 

The assumption of homoscedasocit) implies 
that the groups drawn for an experiment 
possess homogeneous variances imtiallv in 
other words, they should ha\e been drawn from 
the same population (or closely similar 
populations) so that their initial variances mas 
be considered as different estimates of the 
same population variance, differing only due to 
their sampling errors. It should thus be 
reasonable to assume that the error terms of 
individuals of different groups have 
homogeneous dispersions. 

(e) Additivity : 

It should be reasonable to assume that 
different factors, including the independent 
variables used, produce separate bits of 
variations of the dependent variable and these 
variations add up to give the total variation of 
the latter. This additive property of variations, 
due to different factors, enables the analysis of 
the total variance (sf) cf the dependent variable 
into its various components. 


Nonlinear transformations for non 
cf scores : 

In case of a gross deviation of the 
scores of the dependent variable from th e 
normal distribution, the following non-U near 
transformations of raw scores may be tried fo r 
cettini: normalized transformed scores tpacA. 
75 and 265). 

(*) Logarithmic transformation : This i s 
tried for reducing the positive skewness of tfe 
raw score distribution by changing the raw 
scores into their logarithms ; where X is the 
transformed score, X t - log X. It is worked out 
when the variance of raw scores tends to be 
directly proportional to their squared mean. pH 
is such a transformation. 

(ii) Square root transformation : It is often 
used lo decrease the skewness of the raw score 
distribution where the variance of raw scores 
leads to be directly proportional to their mean 
The transformed score is here the square root 

of the raw score : X t = Vx. 

(iil) Reciprocal transformation : It is often 
undertaken w hen the SD of raw scores tends to 
be proportional to their squared mean. Each 
raw score is changed into its reciprocal : X t 
= 1/X. 

(rv) Arcsine transformation : It is tried when 
the variance of a distribution of proportions (p ) 
is roughly proportional to X ( 1 - X ). It senes 
to lengthen the tails of the distribution and to 
compress its centra] part. The transformed score 
is the angle (degree radian) whose sine is the 
square root of the raw’ proportion : X t - 

siir l Jp. 

The transformed scores in each of these 
cases are tested by either chi square or G test 
for the normality of their distribution. If the 
test indicates their significant goodness of 6 l 
with the normal distribution, the transformed 
scores may be used for anova. Otherwise, the 


Scanned by CamScanner 


ANALYSIS OF VARIANCE 


291 


u 


,-cores may be use ^ for nonparametric 

£U“- 5) ' 

, ONE-WAY ANOVA 
\ one-way anova investigates the effects of 
single independent variable on the dependent 
'Viable. It is undertaken to find whether or not 
itic exposure of different groups of subjects or 
.. 1SCS to different levels of a single independent 
variable has produced significant differences in 
^ variance between the groups. For the 
assumptions of one-way anova, see pages 280- 
,gj One-way anova may be either model I or 
model 11 according as the independent variable 
j S a “fixed” experimental treatment or an 
uncontrolled classification variable (pages 279- 
280) ; it cannot be a model III anova which is 
always a higher order of anova with more than 
one independent variable. 

Computation of variance ratio 

One-way anova consists of the computation 
and interpretation of the anova statistic F which 
is the variance ratio between the between- 
groups variance (s b ) and the within-groups 
variance (s 2 ) in this class of anova. 

• 2 

_ between - groups vanance _ 
within - groups variance 5 2 * 

So* for computing the F ratio in a one-way 
anova, the two variance components, viz., s b 
and s 2 , of the total variance (sf) of the 
dependent variable have to be worked out. But 
sf cannot be directly partitioned into these 
components, because variances are not additive. 
However, the sums of squares as well as their 
degrees of freedom, the ratios of which are the 
respective variances, are additive. So, the total 
sum of squares is first partitioned into its 
component sums of squares which are then 
divided by their respective degrees of freedom 
t0 obtain the variance components. 

1. Partitioning of sums of squares : 

(n) The total sum of squares ( SS f ) is the 
su m of squared deviations of all individual 


scores (viz., X,, X 2 , etc., of the respective 
groups) from their grand mean (X ). If there 
are k number of groups with the respective 


n 


'k ’ 


sizes of H,, n 2 , 

N = 'Ln i = n, + n 2 + .... + n t ; 

y_2Xj I*! +lXj+.... + lX k 

A — 


N 


,+n k 


SS t = II(X,-X) 2 


n, + /j 2 + 

■X) 2 

= I(X, - X ) 2 + I(X 2 -Xf +.... + l(X k -X j 2 ; 
or, 55,= LX, 2 + I X\ + .... + I X\ 

(IX,+IX 2 +.... + IX*) 2 


N 


df,=N~ 1, 


because one df is lost in using X as an 
estimate of the parametric mean p. 

(b) The between-groups sum of squares 
(SSJ is computed by multiplying the squared 
deviation of each group mean (X;) from the 
grand mean with the number (n,) of scores in 
that group as its weight, and totalling the . 
resulting products for all the groups. SS b has 
(Jt - 1) degrees of freedom because k number oL 
group means are being considered and one dj 
is lost in using (X) as an estimate of p. 


y _ X*, . * - Ml 
X \ “ ’ a 2 - 


lb 


ix k 


55,= I[n, (X, - X) 2 ] 

= n,(X, — X) 2 + n 2 (X 2 — X)~ + .... 
+ n k (X k - X) 2 , or, 


55,= 


(IX,) 2 (IX 2 ) : 


(IX t ) 2 


(IX, +IX 2 +.... + IX t ) 

N 


n k 

2 


4f. - * - I- 


Scanned by CamScanner 













292 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


(c) The within-groups sum of squares (SS w ) 
is the sum of squared deviations of all 
individual scores (XJ from their respective 
group means ( X • ). It has (TV - k) as its dj 
because there is a totai of N number of scores, 
and k number of group means (X t , X 2 , etc.) are 
used as estimates of p. 

SS W = Z2 

= KXi-X ,) 2 + Z(X 2 -X 2 ) 2 + .... 

+ I(X k -X k ) 2 . 

df w = N - k. 

However, SS w is usually worked out as 
follows depending on the additive properties of 
the sum of squares and their df. 

ss w = SS, - ss b ; df w = df, -df h = N-k. 

2. Working out of variances and F ratio 

(a) Each sum of square is divided by its dj 
to give the corresponding variance or mean 
square (s 2 ). 


s~ = SSj_ _ SS, 

' df, N -1 ’ s b ~ 


Sh 

dfb 


ss b 

k-l 




ss H . _ _ss^_ 

dfw N-k * 


Of these mean squares, sf is the tola 
vanance, serving as an estimate of th 
dispersions of scores of all the groups aroum 
the grand mean and having the df of (N - 1) 
Similarly, r* is the within-groups variance o 
error vanance (df - N - k) and constitutes at 
unbiased estimate of the population varianc, 
( j, made on the basts of the deviations oi 
individual scores of the groups from ,h f 
respective group means. It results from th, 
uncontrolled random variations of individual 
scores from the corresponding croun l 
owing to random factors like genetic^ 
environmental and social variables “ d 


estimates the variations produced by random 
sampling. On the contrary, s\ is the hetwe en . 
groups variance — so long as all the groups 
may be justifiably assumed to have come fron, 
the same' population, to obey homoscedastictty, 
and not to have changed significantly tnspite of 
their exposure to different levels of the 
independent variable, s% serves as another 
independent estimate of the population vanance 
(cr 2 ), made on the basis of the deviations of 
group means from the grand mean. In other 
words, so long as the variance of the dependent 
variable has not changed significantly owing to 
the independent variable, r* estimates the same 
error variance as is estimated by s 2 . But where 
the independent variable has brought about a 
significant change in the dependent variable, 
also includes an added component due to such 
change, over and above the error variance 
estimated in common by both si and s 1 . As 
the variance due to the uncontrolled factors is 
common to both s 2 and s 2 variances are not 
additive : sj * s\ + s%. 


r- b ait 11C A l UVCCI 

in working out the F ratio or variance ratio 
The denominator of the F ratio is called the 
denominator variance or the lesser mean 

square and is an estimate of the error variance 
Of the dependent variabIe ^ re from 

uncontrolled factors associated 
r ‘ Q *“” pUn S- The numerator of the f 

greater mea 6 nUmera,or variance or the 
variance est s 9Uare, and consists of the 
ficance are w hose source and signi- 

freedom of "the *** degreeS 

variances uvH * are those °f the two 
the greater me “* C ° mputatlon , viz., df b of 

mean ^ ° f "* 

inus, for a one-way anova. 


F = 


greater mean square 
le^ermean square 



df of F : 


k ~U N-k. 


Scanned by CamScanner 


ANALYSIS OF VARIANCE 


293 


j Significance of computed F : 

TV* // o for one-way anova contends that 
Is n o significant difference between the 
a nd their means arc estimates of the 
depopulation // as estimated by the grand 
•^n if H 0 is correct, x * would not differ 
^nificantly from j* and would not contain 
slg a£ jdcd component other than the error 
8 ariancc estimated by s J, so that the computed 
f (i.e., the s l/ s l rat io) would not differ 
significantly from 1.00. Thus, if the computed 
f is found not significant, there is no added 
component of variance in si and no significant 
difference between the groups. 

But where the computed F turns out to be 
significant, si would contain a significant 
added component owing to variations produced 
between (he groups by their exposure to 
different levels of (he independent variable, and 
would thus be significantly higher than — 
the H {) would then be rejected and it would be 
inferred that there are significant differences 
between the groups and between their means 
The added component in the .v- is called an 
added treatment component when due to the 
effect of a “fixed” treatment variable, and an 
added variance component when resulting from 
exposure to an uncontrolled classification 
, variable as the independent variable. 

For finding the probability P of correctness 
of the H Q , the computed F is compared with 



^‘8 U.l. Fractional area (or) beyond a critical F 
value in an F distribution. 


critical F values (F fl ) for chosen levels of 
significance (or not exceeding 0.05). The 
sampling distributions of F values and so. their 
probability distributions also, depend for their 
forms on the df b and df w of the greater and 
lesser mean squares used in computing the F. F 
distributions are reverse J-shaped for very small 
degrees of freedom, but change into unimodal 
distributions with progressively declining 
skewness with the rise in the degrees of 
freedom. So, the appropriate F a has to be 
found out in accordance with the Rvalues*of 
the computed F. Moreover, being a ratio of two 
variances, F cannot be negative ; so, the 
rejection region (a) of the F distribution is 
always the fractional area beyond the critical 
F a in the right tail of the specific F distribution 
for the given df values (Fig. 11.1). If the 
computed F cither equals or exceeds the 
critical F a (Tabic H of Appendix ), the 
probability P of correctness of the H 0 is 
cooMdered too low (F ^ or) and the H 0 cannot 
be retained — the computed F is then 
significant and it is inferred that there are 
significant differences between the group 
means. But if the computed F is lower than the 
critical F a , the H 0 cannot be rejected (F > or) 
— the computed F is not significant and there 
is no significant difference between the group 
means. 

In one-way anova between more than two 
groups, a significant F ratio merely indicates 
significant differences between some or all of 
the groups, and must be followed by multiple 
comparison tests to find which specific groups 
differ significantly from each other ; some such 
cases of anova are cited in Examples under the 
next section (§ 11.4). But in one-way anova 
between two groups only, a significant F leads 
straightway to the inference that the two group 
means differ significantly, and so, needs no 
further follow-up by multiple comparison tests ; 
some such cases of anova are cited in 
Examples 11.3.1-11.3.3. However, in both types 


Scanned by CamScanner 





294 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


of cases, a significant F ratio should be 
foil owed by the computation of either the 
omega square in case of model I an ova, or the 
added variance component in case of model II 
anova. 

On the contrary, if the computed F is not 
significant, neither any multiple comparison test 
nor the computation of omega square or added 
variance component is to follow. 

Omega square 

The independent variable in a model I 
anova is a “fixed” experimental treatment 
under the control of the investigator and so, 
does not suffer from random errors. So. if a 
model I anova yields a significant F, the 
strength of association between the independent 
variable and the dependent variable can be 
estimated by working out the omega square 
(co 2 ). The value of the computed aP gives that 
proportion of the total variance of the 
dependent variable which is uith the 

independent variable. 

, _ a-lKf-l) 

(*-1X^-1) + 

Added variance component 

In model II anova, the uncontrolled 


independent variable suffers from random 
errors and so, the strength of association 
between the independent variable and the 
dependent variable cannot be estimated here 
So, instead of computing omega square, a 
significant F in model II anova is followed by 
working out an estimate of the added variance 
component (sj) in the s^. 

(a) Where all the groups are of equal size 
(«)» 

,2 = 4 -£, 

• n 

( b ) But if there are k number of groups in 
the experiment and they differ in size from 
each other. 



In either case, proportionate variations are 
then worked out for rj and s; as follows : 


(0 for s] 



00 for si : 



Example 113.1. 

Apply one-way anova to find whether or not there is a j:#* 

»o i-ps of fl.es sampled ^ ^ 

Group 1 : 3.8, 4-3, 4.6, 5.1. 4.6, S3. 43, 2 8 4 0 4 0 

Croup 2 : 3.0. 4.6. 3.1, 3.7, 2.8, 45, 3.0/ 22, 24. 4 7 . 

Solution : . * 

r -h- independent sari able consists of an uncontrolled elascifir-ati™, - ,, 
way model 11 anova is applicable here. classification variable, viz., habitat, a one- 

(a) Partitioning of sums of squares : 

The data are entered in the first two columns of Table 11 1 arwt „c~i • 

Uq i i.l and used in working out IX,, XX r IX/ and 


Scanned by CamScanner 













ANALYSIS OF VARIANCE 


295 


fable 11.1. Table for working out sums of squares directly from raw winglength scores. 


Group 1 (X,) 


Group 2 (X,) 


x i 


J 

00 CO 

3.0 

14.44 

9.00 

4.6 

18.49 

21.16 

4.6 

3.1 

21.16 

9.61 

5.1 

3.7 

26.01 

13.69 

4.6 

2.8 

21.16 

7.84 

5.3 

4.5 

28.09 

20.25 

4.5 

3.0 

20.25 

9.00 

2.8 

2.2 

7.84 

4.84 

4.0 

2.4 

16.00 

5.76 

5.0 

4.7 

25.00 

22.09 


Total 44.0 (E^|) 


*4.0 (XX 2 ) 


198.44 (IX 2 ) 


123.24 (IX 2 ) 


n. = 10 ; 


= 10 ; 

a 


N = n, + n, = 10 + 10 = 20. 


55. 


s IX? + IX, 2 - (M + M* = 198.44 + 123.24 - — ° — = 17. 


N 


20 


48. 


d/ f = N - 1 = 20 - 1 = 19. 


55,= 


= (Mi 


(ix 2 ) 2 (IX. + IX,) 1 _ (44.0J 2 . (34.0)* (44.0 + 34.0) 2 _ c ™ 

t»2 " N 10 10 " 20 


55. 


^b*-1«2-U1. 

= 55, - S5„ = 17.48 - 5.00 = 12.48. 
M m N - k - » - 2 - 18 . 


Alternatively , the data are entered in the first two columns of Table 11.2 and used in working out IX v 
IX,, I(X, - X) 2 and I(X, - X ) 2 . 

Table 11.2. Table for working out sums of squares using the means of winglength scores. 


•V; 

lx 

i 

x~ 

(X, - x 

I 

x| 

lx 

i 

bo 

3.0 

- 0.1 

0.01 

-0.9 

0.81 

4.3 

4.6 

0.4 

0.16 

0.7 

0.49 

4.6 

3.1 

0.7 

0.49 

-0.8 

0.64 

5.1 

3.7 

1.2 

1.44 

-0.2 

0.04 

4.6 

2.8 

0.7 

0.49 

- 1.1 

1.21 

5.3 

4.5 

1.4 

1.96 

0.6 

0.36 

4.5 

3.0 

0.6 

0.36 

-0.9 

0.81 

2.8 

. 2.2 

- 1.1 

1.21 

- 1.7 

2.89 



2.25 

0.64 


10.14 


Scanned by CamScanner 

























STATISTICS IN BIOLOGY AND PSYCHOLOGY 


n, = 10 ; n 2 = 10 ; N = n, + n, = 10 + 10 = 20. 

L*i _ « _ AA . ^ I* 2 34.0 y - I*i + I*2 , 44.0+34.0 

n, ' ~W ' 4 4 * * 2 * — = lo" ’ 3 4 * ~ N -20- a 3.9. 

SS ( = I(X,-X) 2 + I(X 2 -X) 2 = 7.34+ 10.14= 17.48. df, = N - 1 = 20 - 1 = 19. 

SS b = n,(X, - X) 2 + n,(X 2 - Xf = 10(4.4 - 3.9) 2 + 10(3.4 - 3.9) 2 = 5.00. 

df b = Jk - 1 = 2 - 1 = 1. 

SS w = SS, - SS b = 17.48 - 5.00 = 12.48. 
df w = N - k = 20 - 2 = 18. 

(b) Computation of variances and F ratio : 




5.00 


= 5.00. 


. _ SS. _ 1148 
- df w - 18 


= 0.69. 


F .A.m. 

2 ~ 0.69 ~ 


Sources of variation 

Between groups 
Within groups 


Table 11.3. Anova table for winglength data. 

Sums of squares df Variances 

500 1 5.00 

1148 18 0.69 




(c) Significance of F : 

Cm,cal F values (dj b = 1. df, = 18) are qumed from Table H of Append,x, for Mail test. 


osti.it> ■ 441; 


^oiti.it) = 8.28. 


A/»u.ia) - 

As the computed F exceeds the critical F for 0 OS ifwi n f 
correct is considered too low ; so. it is inferred that there ist ? ° f ,he 

the groups, and that the two group means differ significantly (P < 0 05 COmponent between 

(<f) Added variance component : 

Size of each group (n) = 10. 


.2 - s lzA 5.00-0.69 
a ~ n =-jo- =0.43. 




[a ___ 


0.43 


4 + s^ ~ 0-69 + 0.43 “ 0,38 ‘ 


Example 11.3.2. 


Apply one-way anova to find whether or not there is a 
ventilation scores (ml/min) of the following two grouns of £^* W,t differe nce between the mean tracht 
doses of a pesticide. 8 grou ' ,s ° f be « 1 ' s treated respectively with ,wo differ. 


Group 1 
Group 2 


80, 81, 
70, 74, 


75, 80, 

68, 67, 


88, 70, 
72, 59, 


74, 71, 

61, 57, 


84, 72. 

68 , 54 . 


/ 


Scanned by CamScanner 
















ANALYSIS OF VARIANCE 


297 


Solution * 

Because the independent variable consists of a “fixed” treatment variable in the form of controlled doses 
f a pesticide, a one-way model I anoxa is applicable here. 

(a) Partitioning of sums of squares : 

'The data are entered in the first two columns of Table 11.4 and used for working out XX,. XX,, 
- X ) 2 and X(X 2 - Xf. 

n, = 10 ; n^ = 10, N - n, + 74 = 10 + 10 = 20. 

X, = = 77.50 ; X, = ^ = -^ = 65.00 ; 


10 


n, 10 


X = = 2Z1±650 = 

S 20 

SS, = X(X, - X) 2 + Kx 2 - X) 2 = 715.125 + S04.625 = 1519.75. 
# f = V - 1 = 20 - l = 19. 


5S t = m,(X, - X) 2 + n 2 (X 2 - X) 2 = 10(77JO - 71.25) 2 + 10(65.00 - 71.25) 2 = 781.25. 

^»*-l«2-l«l. 

SS W = SS' - SS b - 1519.75 - 781.25 * 738.50. df w = N-k = 20-2 = 18. 


Tbble 11.4. Tabic for computing sums of squares using the means of tracheal ventilation scores. 


Group 1 (X,) 

Group 2 (X,) 

X,-X 

(.V, - v> : 

X 2 - X 

(X, - X )’- 

80 

70 

8.75 

76.5625 

- 1.25 

1.5625 

81 ’ 

74 

9.75 

95.0625 

2.75 

7.5625 

75 

68 

3.75 

14.0625 

- 3.25 

10.5625 

80 

67 

8.75 

76.5625 

- 4.25 

18.0625 

88 

72 

16.75 

280.5625 

0.75 

0.5625 

70 • 

59 

- 1.25 

1.5625 

-12.25 

150.0625 

74 

61 

2.75 

7.5625 

-10.25 

105.0625 

71 

57 

- 0.25 

0.0625 

-14.25 

203.0625 

84 

68 

12.75 

162.5625 

- 3.25 

10.5625 

72 

54 

0.75 

0.5625 

-17.25 

297.5625 

JE 775 

650 


715.1250 


804.6250 


Alternatively, the data are entered in the first two columns of Table 11.5 and used in working out IX,. 
XXf and LX; 


Scanned by CamScanner 













298 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 



80 


70 

81 


74 

75 


68 

80 


67 

88 


72 

70 


59 

74 


61 

71 


57 

84 


68 

72 

• 

54 

775 


650 


6561 

5625 

6400 

7744 

4900 

5476 

5041 

7056 

5184 

60387 


4624 

44S9 

5184 

3481 

3721 

3249 

4624 

2916 

42664 


*2 * io ; 


* = + *2 = 10 + 10 = 2U. 

= ^ * *** - ~ 1 ■ 60587 ♦ 42664 - gP^O)* . l5 „ „ 


ss,. 


2 N JO “To 20 —“ = 781.25 

i. 


= ‘ = 1519 ' 75 - 7 *>-25 • 738.50. df , N _ k _ 

/u\ r. r* — k — 20 — 2 = 18 

«-) Computation of variance and F ratio : 

•SS.- 738 sn 


t 2 _ SS b . 781.25 
b ~lfb T~ = 78l -25 



19.04. 


Sources of variat ion 

Between groups 
Withi n groups 

Total 




df 


19 


( c ) Significance of F : _ _ 

Critical F values (df. = i df _ , 0 , 

6 18) ^ quoted from Tab , eHof 

As ^ F -05(U8) = 4.41 ; F PP e "dix , for two-tail test. 

is consitreZTlow" T? ^ ^ F for = ^ 

groups, and the two groups mferred th at there is a signTfi^' **** P roba bility P for co 

P dlffer Wficantly ( P<Q Sl * m ficant added treatment C ° freCtness of ** ”o 

*■ c °mponent between the 




Scanned by CamScanner 











ANALYSIS OF VARIANCE 


299 


Qjfiggd square . 

$ m odel I anova has a significant F ratio, the strength of association is estimated between 1L 
A 5 l ^ e var iable and the dependent variable by working out the omega square. 

„ 2 _ (*-l)(F-l) _ (2-l)(19.04-l) = Q47 

(k-l)(F-l) + N (2-1X19.04-1)+ 20 

a proportion of 0.47 of the total variance of tracheal ventilation is associated with the pesticide used 
£treatment variable. 


p0pk II 3 3 ' . .. 

Following are the performance test scores of two groups of students after they have practised accorung 
two respective practice schedules given to them. Do the groups differ significantly in performance due to 
Lr different practice schedules ? 

Gr0U pl : 13, 20, 21, 20. 18, 12, 15. 23, 17, 13. 

Group 2 : 28, 16, 30. 24. 20, 14, 32. 30. 20. 27, 28. 

Solution : 

A one-way model I anova is appropriate here because the two practice schedules constitute a controlled 
("fixed”) treatment variable. 

(o) Partitioning of sums of squares : 

The data are entered in the first two columns of Tabic 11.7 and used in working out LX,, LX 2 , LX 2 and 
IX 2 2 . 


Thble 11.7. Table for computing sums of squares directly from raw scores of performance test. 


Group 1 (X,) 

Group 2 (X,) 

x? 

Xl 

13 


169 

784 

20 

16 

400 

256 

21 

30 

441 

900 

20 

24 

400 

576 | 

18 

20 

324 

400 

12 

14 

144 

196 

15 

32 

225 

1024 

23 

30 

529 

900 

17 

20 

289 

400 

13 

27 

169 

729 


28 

• 

784 

Total 172 

269 

3090 

6949 


n, = 10 ; n 2 = 11 ; N = n, + n 2 = 10 + 11 = 21. 


Scanned by CamScanner 









300 


STATISTICS IN IHOIOHY ANI> P.'W'IIOJ.OOY 


SS, m lx} + );xj - ■ 3090 0949 - , < 172 ■ 77H.(X). 

■ Af • 1 ■ 21 - I ■ 20.^ 

SS, - IMil ♦ IL&i - , 4g£ ♦ (2 ;^ - - 275.67. 


4T* ■ * - I ■ 2 - I 


(lf w MN-km2l-2m 19. 


SS W ■ SS, - SS, ■ 778.00 - 275.67 - 502.33. 

(ft) Computation of variances anti F ratio : 

- St . . 275.67. ,2 . St . SL” . 26.44. F - 4 

m I C 19 


0 7 
44 


10.43. 


Ttablc 11.8. A nova tabic for performance score data. 


Sources of variation 

Sums of squares 

df 

Vuriances 

F 

Between groups 

275.67 

1 

275.67 

10.43 

Within groups 

502.33 

19 

26.44 


Total 

778.00 

20 




(c) Significance of F : 

Two-tnil critical /•’ values (<lf h - 1, <(f M 19) arc quoted from Table II of Appendix. 


^.05(1.19) = ^ • ^jOHI.It) “ ®*1®* 

As the computed F exceeds even the critical F for 0.01 level, the probability P of the H 0 being correct 
is considered too low. So, it is inferred that there is a significant added treatment component between the 
groups and the groups differ significantly (P < 0.01). 

(</) Omega square : 

As the model I nnovn has a significant F ratio, omega square is worked out to estimate the strength of 
association between the independent variable and the dependent variable. 

..i_ (*-0(f-l) (2-1X10.43-1) 

' (*-l)(F-l) + yV (2-l)(l0.43-l) + 21 

So, a proportion of 0.31 of the total variance of the performance test scores is associated with the 
practice schedule used as the treatment variable. 


11.4 MULTIPLE COMPARISON TESTS 

If the F ratio is found to be significant in 
an anova with more than two groups, it should 
be followed by a multiple comparison test to 
find which group means differ significantly 
from each other. There are two types of 
multiple comparison tests. 


Before-design dr a-priori comparisons : 

In some cases, even before commencing the 
experiment and collecting the data, the 
investigator can plan which group means 
should be subjected to a test for finding the 
significance of their differences. Statistical tests 
for such pre-planned comparisons of chosen 


Scanned by CamScanner 
















ANALYSIS OF VARIANCE 


301 


u p means are called a-priori or before- 
P . _ comparisons ; e.g., multiple comparison 
^ t and Scheffe’s F test. The choice of group 
1 ^ s t0 be compared, and of the test to be 
^ lied, depends on the purpose and design of 
'f- experiment in such cases, not on the 
lamination and scrutiny of the data after the 

experiment is over. 


After-design or a-posteriori comparisons : 

In some cases, however, nothing can be so 
planned initially about which group means 
should be finally compared. After completion 
of the experiment in such cases, the data are 
subjected to a preliminary F test (§ 11.3) to 
find whether there is any significant difference 
at all between the group means of the data. If 
the F ratio is found to be significant, an after- 
design or a-posteriori comparison like Gabriel's 
sflm of squares simultaneous test procedure 
(SS-STP) may be applied to explore the group 
means for significant differences. 


Multiple comparison t test 

Student’s t is computed from the difference 
between the means of each pair of groups 
chosen before the experiment has been actually 
performed. Thus, this is a before-design or a- 
priori comparison. If the anova has yielded 
significant F ratio, one multiple comparison t 
test has to be undertaken for each chosen pair 
°f group means. For each such test, the 
difference between the group means of the 
chosen pair, say (X, - X,), is converted to t 
using the SE of their difference (^_x,) 
the latter is computed using the within- 
groups variance or error variance (j“) used 
ea dier in calculating the F ratio. Thus, for the 
dtfference (X, - X 2 ), 


s x i -x 2 ~ 



X,-X 2 

t =-- 

s x x -x 2 


where n ( and ru, are the sizes of the respective 
groups, N is the total of all the group sizes in 
the experiment, k is the number of groups in 
the experiment, and ( N - k ) is the df w of the 
within-groups variance or lesser mean square. 

If the computed t exceeds or equals the 
critical t score (df = N — k) for the chosen level 
of significance (or), the relevant group means 
are considered to differ significantly ( P $ ce). 
This procedure is repeated for group means of 
each chosen pair, using s* and the sizes of 
those two groups. 

t test with Bonferroni adjustment 

In the Bonferroni method, the difference 
between the means of each chosen pair of 
groups is converted as above to t score, using 
si and the group sizes. The probability level 
( P) for each such computed t score is found 
out as above, by comparing the latter with the 
critical t scores having the same df. This P is 
then multiplied by the total number (Id) of 
pairs of groups, being compared, to find the 
adjusted probability (P'). If this P' is lower 
than or equal to either a chosen a or the a of 
0.05, relevant group means are considered to 
differ significantly from each other at or 
beyond that a (P / ^ a). 

Scheffe’s F test 

It is a powerful a-priori multiple 
comparison test, applicable inspite of minor 
degrees of skewness and heterogeneity of the 
distribution of scores, and inequality of the 
group sizes. It restricts type I errors of 
inference considerably, but enhances the type II 
errors. 

For this test, F is computed from the 
difference between the means of each pair of 


X,-X 2 


y n x n 2 


df — N — k i 


Scanned by CamScanner 






302 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


groups, chosen prior to the experiment, using 
j; and the sizes of the chosen groups. Thus, 

for the difference (X, - X 2 ), 


I, 2 5 2 

** = : 


F = 


fx.-V 2 


\ S Xt-*2 ) 


(Xi-X 2 ) 2 




Each computed F is then compared with 
critical h values for 0.05 and 0.01 levels of 
significance. Each critical F* has to be 
computed by multiplying (Jfc - 1) with the 
critical /• ( df b , df w ) for the given level of a. 
taken from I able H of Appendix, k being the 
toi.il number of groups being compared. For 
example, where only two of the group means 
are being compared, and sj nod s * have 
respectively 4 and 26 as their ft critical F* 
values for 0.05 and 0.01 levels are given b> : 

f 05 = (k-l) F 05(426) = (2- 1) X 2.74 a 174 ; 

F oi = 0 ^.01(4^6) = (2- 1) x 4.14 = 4.14. 

If the computed F of any pair of group 
means exceeds or equals the critical F* value 
ihe difference between those group means is 
significant at or beyond that a I.P < a) This 

procedure is repeated for all other chosen pairs 
ot group means. 

Gabriel’s SS-STP 

The sum of squares simultaneous test 
procedure is used as an a-posteriori test. 

(<1) First, the critical sum of squares iSI \ 
“ C0 " ,pu,ed - *e critical F salue (F ) f “ 

7 chosen significance level (a) and wiS, Z 
degrees of freedom (df and m t . ™ “ e 

and lesser mean 

is the total number of ermine i -> . ere * 
within-groups variance, " p ’ and is the 


SS a - (k - 1) s- F mi ^ 

a 

(b) The group means are computed from tf* 
obtained data and scrutinized to identify those 
group means which apparently look so close as 
to raise doubts about any significant difference 
between them. The between-groups sum oj 
squares (55,) is computed between those 
groups. If, for example, the means of groups 2, 
3 and 5 appear to be too close in the data 
obtained in an experiment. 55, is computed 
between them, using the sums of scores (IX,, 

IXj) of the respective groups and the 
respective group sizes (« 2 , n v n 5 ). 

a 

SS - <I*r>* , <I*.f , (lAf,) 1 
*1 n, 

_ At *>£*»♦ X *,) 2 

(c) The computed 55, is next compared 
»nh the computed S5 a . There is a significant 
difference between the chosen group means 

' *:* and * n this case) only if the 
computed 55, equals or exceeds the 55 
(F $ a). ff 

(d) The data are next inspected to find jf 

any other group mean apparently differs 

consi erably from the group means tested in 

. ^Ps. 55, is then computed with 

“LT group mean (say, group 4 in the 
example cited). 

SS, = <£**£ + ( ZX 2 ±ZX l ±lX 1 l 

” 4 **2 + «3 + / I5 

- £& ±ZX 2 + TX, + YX C f 

n 4 + n 2 + n 3 + n 5 

with thPcc Computed SS b is next compared 
this cc “ computed “ in step (a). Only if 
™fS*equak <* exceeds the SS„, tbe group 

other 2 m 8 teSted diffeiS aisnificamly from the 
above. P teSted in (») “<* W 


Scanned by CamScanner 
















ANALYSIS OF VARIANCE 


' 303 


£xampl e 11-4-1' 

Blood sugar (mg dL ') was estimated in three groups of animals one hour after injecting the groups 1. 2 
and 3 with respectively the placebo, 100 /;g of an anti-diabetic substance and 150 /rg of the latter. The 
blood sugar scores are recorded in the first three columns of Table 11.9. Is there any significant difference 
between the means of groups 1 and 2, and between those of groups 2 and 3 ? Also, estimate the strength of 
association between blood sugar and the anti-diabetic substance, if F ratio is significant. 

Solution : 

Because the independent variable consists of “fixed” treatments with different levels of the anti-diabetic 
substance, a one-way model I anova is applicable here. 


(a) Partitioning of sums of squares : 

The data entered in the first three columns of Table 11.9 are used in working out IX., IX,. IX,, IX?, 
IXj and IXf 


Table 11.9. Table for computing sums of squares of blood sugar data. 


Blood sugar scores 

if 

*2 2 

x; 



*3 

128 

118 

88 

16384 

13924 

7744 

124 

115 

80 

15376 

13225 

6400 

129 

122 

84 

16641 

14884 

7056 

135 

128 

96 

18225 

16384 

9216 

132 

125 

86 

17424 

15625 

7396 

128 

117 

87 

16384 

13689 

7569 

118 

110 

96 

13924 

12100 

9216 

123 

116 

78 

15129 

13456 

6084 

117 

108 

98 

13689 

11664 

9604 

133 

126 

99 

17689 

15876 

9801 

I 1267 

1185 

892 

160865 

140827 

80086 


n, = 10 ; n, = 10 ; n^= 10 ; N = n, + n 2 + n 3 = 10 + 10 + 10 = 30. 





““-126.7 mg; *-X& . v - M 


10 = n8.5 mg : X 3 = ^ 


10 

ss t = ix? + ixf + ixf - (S y i+S*?+S*3) _ 


10 


N 

= 160865 + 140827 + 80086 - (l267 + + 89 — = 9033.47. 


df t = N - 1 = 30 - 1 = 29. 

„„ (IX,) 2 . (ix 2 ) 2 (ZX 3 ) 2 (IX, + £ X,-HX 3 ) 2 

"2 "3 * 

(1267 ) 2 (1185) 2 (892) 2 ( 1267 + 1185 + 892) 2 =im2 'j 

- 10 + 10 10 30 


89.2 mg. 



Scanned by CamScanner 






























304 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


df b = k-\ = 3-l=2. 


SS W = SS' - ss b = 9033.47 - 7773.27 = 1260.20 
df w = N - k = 30 - 3 = 27. 

( b ) Computation of variances and F ratio : 

t 2 7773.27 _ , SS.. 1260.20 ir r ** 

' 2 - 3886 64 • • 4 = = 27 = 46 67 • f = 4 = 

4/ of computed F : df w = 2, 27. 

Table 11.10. Anova table for blood sugar data. 


3SS6.64 

46.67 


= 83.28. 


Sources of variation 

Sums of squares 

df 

Variances 

F 


Between groups 

7773.27 

2 

3886.64 

83.28 


Within groups 

1260.20 

27 

40.67 



Total 

9033.47 

29 





Critical F values (df = 2, 27) arc quoted below from Table H of Appendix. 

^.05(2, 27) = 3-35 ^\oi(2, 27) “ 5.49. 

As the computed F is found to be higher than the critical F for 0.01 level, the computed F is significant 
groups^ thC ° 01 IeVCl ^ < HcnCC ’ ,hcrc i$ a si 8 ni f ,cant ati ded treatment component between the 

(d) Multiple comparison tests : 

Scheffe’s multiple comparison F lest may be done to find out whether (X, - X.) and IX - 7 1 .re 
significant. As three group means are being compared. * = 3. 1 a " 


F = 


(X,-X 2 ) 2 

1 1 ' 
— + — 


V"1 


n. 


(126.7 

-118.5) 2 


' 1 P 

46.67 



l l0 V 


= 7.20 ; 


F = 


(X 2 ~X 3 ) 2 _ (118.5-89.2)' 


( 1 

l 1 


r 1 P 


46.67 

V 2 

W 3j 


O 

o 


= 91.97. 


Critical r M - ft - 1) F 0I(U7) = (3 - 1) x 5.49 = 10.98. 

Cntica 1 F' 5 = (k - 1) F MfU7) = (3 - 1) x 3.35 = 6.70. 

M die F mfio for (X, - Z) exceeds the crifical FJ, but no, Ore critical F'„ the difference between % 
and X, rs srgmfican, beyond the 0.05 level „„,y (F < 0.05). B„, as the F ratio for (X, - X e^ als 

‘ hC * — — * - S ^ significant beyond the 0.0, ^ “ 

[Altematively^mulliple common , may be worked ou, with Bonfenoni modification. 

X. -X 2 126.7-118.5 V v 

~f~T = ~ T = 2.684 ; “X, 

fcLiL F 67 , 4667 

!«! # 2 V 10 10 


t = 


f = 



118.5-89.2 


*2 fl 3 

df = N - k = 30 - 3 = 27. 


46.67 46.67 

Kr + ~io 


= = 9.590. 


Scanned by CamScanner 




































ANALYSIS OF VARIANCE 


305 


Critical t scores from Table B of Appendix : 

*.02(27) 2.473 , * 01 ( 27 ) ~ 2.771 , *.ooi( 27 ) = 3.690. 

Comparing the computed t scores with the critical t scores, it is found that (X| - X 2 ) is significant 
beyond the 0.02 level (P < 0.02) while (X 2 — X 3 ) is significant beyond the 0.001 level (P < 0.001). 

To apply the Bonferroni modification , each P obtained by the t test is multiplied by the number k' of 
paired comparisons to get the corrected probability P' of the H Q being correct. 

^ = 2 ; P' = VP ; 

*** For (X\ - X 2 ) : P' = *7> = 2 x (< 0.02) = < 0.04 ; 
for (X 2 - X 3 ) : P' = k'P = 2 x (< 0.001) = < 0.002. 

Thus, (X, - X 2 ) is significant beyond the 0.04 level (P < 0.04) while (X 2 - X,) is significant beyond 
the 0.002 level (P < 0.002).) 

( e ) Strength of association : 

Omega square is computed to estimate the strength of association between blood sugar and the anti- 
diabetic factor. Using the /** ratio computed in the model I anova and the number k of groups, 

ofi zt -(*-»XF-l) „ (3-1X83 28-1) 

tf-IXF-D + tf (3-lX*3 2«-l) + 30 u,w - 

So, a proportion of 0.85 of the total variance of blood sugar is associated with the anti-diabetic factor 
used as the treatment variable. 


Example 11.4.2 . 


The scores obtained in an abstract reasoning test by three groups of students, drawn from three different 
socioeconomic strata, are given in the first three columns of Table 11.11. Apply anova to find whether an 
added variance component is present in the variance between the groups due to random socioeconomic 

factors (a = 0.01). 


Solution : 

The data, arranged in Table 11.11, are subjected to a one-way model II anova because the socioeconomic 
strata constitute a classification variable beyond the control of the investigator. 


(a) Partitioning of sums of squares : 

The data entered in the first three columas of Table 11.11 are used in working out the group means and 

^ sums of squares. 


Scanned by CamScanner 




306 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Table 11.11. Table computing sums of squares from means of abstract reasoning scores. 


X. 

*2 

*3 

X, - X (X, 

-X) 2 

X, - X 

(\\ - x) J > 

f, - X . 

(X 


26 

30 

34 

- 4.4 

19.36 

- 0.4 

0.16 

+ 3.6 


l.’ ‘>6 

27 

34 

35 

- 3.4 

11.56 

+ 3.6 

12.96 

+ 4.6 


21.16 

25 

28 

28 

- 5.4 

29.16 

- 2.4 

5.76 

- 2.4 


5.76 

26 

29 

27 

- 4.4 

19.36 

- 1.4 

1.96 

- 3.4 


11.56 

28 

32 

34 

- 2.4 

5.76 

+ 1.6 

2.56 

+ 3.6 


12.96 

30 

31 

33 

- 0.4 

0.16 

+ 0.6 

0.36 

+ 2.6 


6.76 

26 

32 

32 

- 4.4 

19.36 

+ 1.6 

2.56 

+ 1.6 


2.56 

29 

34 

33 

- 1.4 

1.96 

+ 3.6 

12.96 

+ 2.6 


6.76 

31 

29 

29 

+ 0.6 

0.36 

- 1.4 

1.96 

- 1.4 


1.96 

32 


34 

+ 1.6 

2.56 



+ 3.6 


12.96 



33 





+ 2.6 


6.76 

I 280 

279 

352 


109.60 


41.24 



102.16 


«, = 10 : 

i n 2 

= 9 ; ' n 3 = 11 

1 ; i 

N = «, + n 2 

+ /t 3 =10 + 9 

+ 11 = 

30. 

1 


X, = 

"1 

. 280 
10 

= 28. % = - 

IX, 

tb 

*|3 

II 

x, = Mi 

"a 

11 

= 32. 

• 

• 

X 

7 _ ix,+£x,+: 

N 

ix 3 

280 + 279 + 352 

30 - 30A 





SS f = 

I(X 1 - 

X) 2 + I(X 2 - X) 2 

+ I(X 3 

-X) 2 




% 


= 

109.60 

f 41.24 + 102.16 : 

= 253.0 

; df t 

= N - 1 = 30 

- 1 = 29. 



ss b = 

n,(X, - 

X) 2 + n 2 (X 2 - X) 2 + n 3 ( X 3 - X) 2 







10(28 - 

- 30.4) 2 + 9(31 - 

30.4) 2 4 

■ 11(32 - 30 

'•4) 2 = 89.0 ; 






Jfc - 1 = 

3-1=2. 








ss w = 

SS t - SS b = 253.0 - 89.0 

= 164.0 ; dl 

w = N - k~ 30-3 = 

27. 



Alternatively , the abstract reasoning scores are entered in Table Ill? 
ix 2 , ix 3 , IX?, IX? and IX3 2 . 


and used in working out IX,. 


Scanned by CamScanner 















r 


ANALYSIS OF VARIANCE 307 


Table 11.12. 

Table for comparing sums < 

of squares directly from 

raw abstract reasoning scores 



*2 

*3 



*2 

X? 

26 


30 

34 


676 

900 

1156 

27 


34 

35 


729 

1156 

1^5 

25 


28 

28 


625 

784 

784 

26 


29 

27 

# 

676 

841 

729 

28 


32 

34 


784 

1024 

1156 

30 


31 

33 


900 

961 

1089 

26 


32 

32 


676 

1024 

1024 

29 


34 

33 


841 

1156 

1089 

31 


29 

29 


961 

841 

841 

32 



34 


1024 


1156 




33 




1089 

' l 280 


279 

352 


7892 

8687 

11338 



EX 2 4 

LXf 4 IX* - 

<I*Lt 

N 




- 

7892 ♦ 

8687 4 11338 

(280 

4 279 4 352)* 

“15-“ 

252.97 ; 



*,■ 

N- 1 . 

.30-1-29 





SS k m 

<xV 

' 4^4 

HM. 

(IX, 4IX, 4IX,) 2 




"i 

■» 


N 




= 

(2801 2 

10 

(279) 5 (352)* 

9 11 

m+lglM . 8g. 97 . 



#*- 

k - 1 s 

= 3-1=2. 






ss w = 

= SS f - 

SS h = 252.97 - 

88.97 = 

= 164.0 ; 

II 

-V 

1 

II 

30 - 3 = 27. 



f 

(b) Computation of variances and F ratio : 


I 


SS h _ 88.97 
dfb 2 


= 44.49 ; 



164.0 

27 


= 6.07 ; 


r si 44.49 

f= 7 = W 


7.33. 


Table 11.13. Anova table for abstract reasoning scores. 


-—Sources of variation 

Sums of squares 

df 

Variances 

F 


Between groups 

88.97 

2 

44.49 

7.33 


groups 

164.0 

27 

6.07 



^Total 

' 252.97 

29 







Scanned by CamScanner 




























STATISTICS IN BIOLOGY AND PSYCHOLOGY 


:) Significance of F : 

a = 0.01. Critical F 01(2>2 ^ = 5.49 (Table H of Appendix ), 

U lhe computed F is higher than the critical F for the 0.01 level of significance, it is sittnifican, 
' ’ 0 ) there 1S a significant added variance component (s~) between the groups. 

(d) Added variance component : 




“ 5 2 
b h' 


44.49 - 6.07 


k-il n,+n2+nj " 


n[+n]+n5 


+ /Ij 

Proportionate variation due to t\ is given by : 



10 + 9+11- 


10 2 + 9 2 +ll 2 
10 + 9+11 


= 3.85. 


£ _ 185 

j; +*; ‘ 64)7 + 185 = 0 i9 ' 


accounted^for^ 1 the°addied " 0m duc t0 thc 


uncontrolled factors is 


Example 11.4.3. 


The lengths (mm) from the anterior end of ihr n i 

*•»«toe 4Uhreu £? ** »*•* Ax group, 

whether or not the grape an hnwgeneon. ‘ hree columns of Thble 11 i t i „li 

Solution : 


To test for the homogeneity 
independent variable, viz., habitat. 


of variances of the 
a one-way model II 


groups of pigeons 
anova is applied. 


exposed to the uncontrolled 



Scanned by CamScanner 














ANALYSIS OF VARIANCE 


309 






(a) Partitioning of sums of squares : 

ra w scores entered in the first three columns of Table 11-.14 are used in working out XX,, XX„ 

tt, vtf. a? -a “?• 

n, - 10 ; Uj * 10 ; n, = 10 ; N = n, + n 2 + n } = 10 + 10 + 10 = 30. 

SS,= XXf + ixl + 1X\ - QLllhllM. 


= 236.91 + 177.31 + 156.72 - 


(47.9+41.7+ 39.2)* 

30 


17.96 ; 


df s AT - 1 * 30 - 1 * 29. 

CC . + <£*/ ♦ £*/ . <X*.+X*2 + I*/ 

n, n, ,V 


(47,9) 3 (41.7)* (39.2i* (47.9 + 41.7 + 39.2)* , 4Q , . 

“ 10 10 10 “ 30 

df h *k- I ■ 3 - I ■ 1 

SS M - SS, - SS* ■ 17.96 - 4.01 ■ 13.95 ; df m • N - k * 30 - 3 * 27. 


(b) Computation of variances and F nano : 

2.01 ; 


*^£01 
'» 44 2 


. ». 

• 44 



0 52. 


F ~i 


2.01 

0.52 


= 3.87. 


Table 11.15. Anova table for pigeon narial opening data. 


Sources of variation 

Sums of squares 


Yariances 

F 

Between groups 

4.01 

2 

* 2.01 

3.87 

Within groups 

13.95 

27 

0.52 


Tbtal 

17.96 

29 




(c) Significance of F : 

Critical F values = 2, df K = 27) are quoted firom Table H of Appendix. 


F.osczt ) “ 3 "* 5 ’ 


F = 5 49 

A j01CZ27) 


As the computed F exceeds the critical F„. Ihe computed F is considered significant (P < ^ 

«*re is a significant added variance component between the groups and the groups are not homogeneous. 

^ e cau$e all the groups are of equal size (n = 10), 


.2 _ 5 l 5 l _ 101 °£r. = 0.15 : 
-10 


0.15 


n 


sl + tl 


0.52 + 0.15 


= 0.22. 


Thus, a proportion of 0.22 of the variance of the dependent variabie due to uncontrolled factors is 
acc ounted for by the added variance component. 


Scanned by CamScanner 




















310 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


11.5 KRUSKAL-WALLIS 

NONPARAMETRIC ANOVA 

This is an efficient rank-dependent non- 
parametric method of one-way anova for 
finding the significance of difference between 
the observations of two or more groups at a 
time. The Kruskal-Wallis statistic H, when 
computed from only two groups, is identical 
with the square of the z score computed from 
the Mann-Whitney U from the same two 
groups. 

Assumptions 

For the Kruskal-Wallis anova, it should be 
justifiable to assume that : 

(a) each score of the dependent variable 
occurs in the sample at random and 
independent of all other scores — in other 
words, the error terms of the scores arc 
independent of each other ; 

(b) either the dependent variable is an 
ordinal one with its magnitudes already gisen 
in ranks for the individuals of the sample, or it 
is a continuous or discrete measurement 
variable whose scores can be converted into 
ranks. This assumption makes the test 
unsuitable for attributes or qualitative variables. 

It may be applied to continuous, discrete or 
ordinal variables, normal or non-normal 
distributions and small samples, because it does 
not require the assumptions for continuous and 
normal distributions of the dependent variable 
scores. 

Computation 

(a) Ranks are given in an ascending order 
to the scores of all the groups taken together. 
Two or more identical (tied) scores, occurring 
either in the same group or in different groups^ 
are each given an average rank which is the 
mean of the actual ranks they would have got 
if they were consecutive non-identical scores. 
The score, next higher than the tied set, is 
allotted the rank that it would have occupied if 


the tied scores preceding it had separate 
consecutive ranks instead of an average one. In 
Example 11.5.1 , for instance, the score 7.$ 
occurs once in each of the groups 1 and 2. 
These two scores would have ranks of 6 and 7 
if they were consecutive scores and not tied. 
So, each of these two tied scores is given an 
average rank of (6 + 7)/2 or 6.5 while the next 
higher score, viz.. 8.2 in group 1, receives the 
rank of 8 (Table 11.16). 

Being a rank statistic, the Kruskal-Wallis 
statistic H suffers from inaccuracies because (») 
an average rank is allotted to all the scores of a 
tied set instead of their true separate ranks, and 
(ii) successive ranks are given to consecutive 
scores with no consideration of the varying 
differences in magnitude between those scores. 

(b) The ranks of each group arc added 
•eparately to give the rank sums (R.) of the 
respective groups. 


( f ) Tbc mean rank (R t ) of each group and 
tiic mean rank ( R ) of all the groups are 
computed using respectively the size (n.) of 
each group and the total size (AO of all the 
groups (N = In). 



# _ 2Rj _ 2R, 

I n, N ' 


(d) Where there are k number of groups, 
N = n i + n 2 + —- + n k ; df = k - 1. 


N(N + 1) 

= WnTT) ~ R +n 2( R 2- R ) 2 +•••• 


+n k (R k -/?) 2 |. 

Alternatively, omitting steps (c) and (, d ), 


H 


12 

A'(W + 1) 



- 3(N + 1 ) 


12 

N(N + 1) 




n i 


- 3 (N + Ik 


Scanned by CamScanner 









ANALYSIS OF VARIANCE 


311 


Significance 

The null hypothesis (HJ contends that the 
computed H is not significantly different from 
O and that there is no significant difference 
between the scores (or ranks) of the different 
groups. In other words, the H Q contends that 
the groups come from the same population and 
g o, their medians are identical. If the H Q is 
correct, the distribution of the statistic H would 
be almost identical with the chi square 
distribution for the same df. To find the 
probability ( P) of the H Q being correct, the 
computed H is compared with the critical y 2 
for the chosen significance level (a). The 
computed // is considered significant and the 
groups arc considered to differ significantly 
only if the computed H exceeds or equals the 
critical y 2 value lot the chosen a (P $ a). If 
the ci11Kill y- js in "i. in the computed /#, 
there is no significant difference between the 


scores (or ranks) of the groups (P > a). 

Multiple comparison by Mann-Whitnev 
test 

If the computed H is found significant 
and there are more than two groups in 
the experiment, multiple comparison Mann- 
Whitney V test is performed as an a-priori test 
between groups chosen while designing the 
experiment, to find which of those groups 
differ significantly from each other. But when 
the Kruskal-Wallis test has been done between 
only two groups, a significant H leads directly 
to the inference that the two groups have a 
significant difference ; no multiple comparison 
Mann-Whitney test needs to follow. 

^herc the computed H is not significant 
J P * a) ' ,hc do not differ significantly. 

In this case also, no multiple comparison test 
ncc-ik to be worked out. 


Example 11.5.1, 

level, of an anti-anctnic factor, after 

Groun 1 to „ 


Group 1 
Group 2 
Group 3 


7.3, 

10.4. 

12 . 0 , 


8 . 2 , 

7.8, 

10 . 6 , 


8.4, 
9.6. 

13.4. 


7-2, 

9-2, 

14.6, 


7A 

10 . 0 . 

14.0. 


7-6, 

8 . 8 . 

14.0. 


7.8, 

9.4, 

11 . 0 , 


7.4. 

9.2. 

12 . 0 . 


(", = 8 ). 
(« 2 = 8 ). 
0*3 = 8 ). 


Solution : grou P s 2 and 3 - 

SCOreS 0f *■ *"*■ « entered i„ Ute first, 

^ “ UB ° f ^‘ together. Average 

- - - each grott, a* added „ . , ve fc ^ ^ 

K) The mean rank l ^ tnat group. 

8ra,P “ We “ “ - - - <*> - - - -V — *°gether, „ 


\ 


Scanned by CamScanner 






312 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Tabic 11.16. Table for nonparametric anova of blood hemoglobin data 


Group 1 


Group 2 


Group 3 

Scores (X,) 

Ranks 

Scores (Xj) 

Ranks 

Scores (Xj) 

Ranks 

7.3 

3 

10.4 

~16~ 

12.0 

19.5 ~ 

8.2 

8 

7.8 

6.5 

10.6 

17 

8.4 

9 

9.6 

14 

13.4 

21 

7.2 

2 

9.2 

11.5 

14.6 

24 

7.0 

1 

10.0 

15 

14.0 

215 

7.6 

5 

8.8 

10 

14.0 * 

215 

7.8 

6.5 

9.4 

13 

11.0 

18 

7.4 

4 

9.2 

11.5 

110 

19.5 

Rank sums 

Mean ranks 

38.5 (R,) 

4.81(A|) 


97 J (Kj) 

12.19 (* 2 ) 


164.0 (Rj) 

20J (R 3 ) 


tt „. E.i.li! 


= 8 


« 20.5 ; 


u _ ZRi 38.5 + 97.5 + 164.0 

« = -=— =-oTTTS- = 12.50. 

2>, 8 4-8 + 8 

(d) The statistic H is computed using the mean ranks. 

N=f^+w 2 + n 3 = 8 + 8 + 8 = 24. df = k - 1 = 3-1= 2. 

" = TT( N + \) h (/?1 "^ )2 +«2(^2 -tf) 2 +flj(tf, -*) 2 ] 

= 24^25 [ 8(4 - 81 - 12 50)2 + 8(1219 " 12 - 50 > 2 + 8(20.5 - 12.50) 2 ] = 19.72. 

[ Alternatively , // may be computed directly from /?, scores, omitting steps (c) and (d). 


12 R *’ 12 

//= ..... .. 1— - 3(N+ 1) = 12 


N(N+1) n, 


N(N + 1) 


-m*i> 

*h *2 "3 J 


12 


24x25 


38.5 2 97.5 2 1 64.0 2 ) 


- 3 x 25 = 19.71. ] 


(e) Critical x 2 values with 2 degrees of freedom are quoted from Table C of Appendix. 

£.05(2) = 5-99 ; £xu(2 ) = ^1 » £x»oi(2) = 13.82. 

As the computed H exceeds the critical x 2 for 0.001 level, the probabiUty of the null hypothesis being 
correct is considered too low (/ J « 0.001). Thus, there are significant differences between the groups. 

(/) Multiple comparison Mann-Whitney U test is performed to explore the significance of difference 
between groups 1 and 2, and between groups 2 and 3. (See § 9.6 for details.) 


Scanned by CamScanner 























ANALYSIS OF VARIANCE 


313 


Groups 1 and 2 : 

(/) Scores of these two groups are entered in Table 11.17, ranks are assigned to them in an ascending 
r( jer taking both the groups together, and the rank sums ( R , and /?,) are computed for the two groups 

separately- 

(//) Any of the two rank sums is used for computing the statistic U and for converting it to z- Thus. 


U | = n,n 2 + 


n,(n, +1) 


- R, = 8x8 + 


8x9 


38.5 = 61.5. 


„ «,n 2 8x8 |n,n 2 (n,+n 2 + l) |8x8(8 + 8 + l) _ QO . 

Ue = ^r = — = 320 5 V- n -= v—i 2 - 952 • 

= 61.5-32.0 = 

Z 9.52 X 

(iii) The probability P of the correctness of the H 0 is worked out using the normal curve table (Table A 
of Appendix). 

p s 2 [0.5000 - (area from pi to computed z of 3.10)1 = 2 (0.5000 - 0.4990) = 0.002. 

As P is too low, the difference between groups 1 and 2 is considered significant (P = 0.002). 

Groups 2 and 3 : 

(/) The scores of these two groups are entered in Tabic 11.18. ranks are assigned to them taking both the 
groups together, and the rank sums (R 2 and *,) are computed for the two groups separately. 

(ii) Same as in the previous case : 

U 2 = „ 2 „ 3 + «Mh+ l >- _ * 2 = 8 x 8 + ^ - 36.0 = 64.0. (Table 11.18) 


<4 = 


n 2 n 3 


8x8 

2 


= 32.0 


/ni/i ? (n-, +n 3 +1) /8x8(8 + 8 + l) _ Q . 

-^2 = V 12 ' * 


Table 11.17. Mann-Whitney test for groups 1 and 2. 


*1 

Ranks 


Ranks 

7.3 

3 

10.4 

16 

8.2 

8 

7.8 

6.5 

8.4 

9 

9.6 

14 

7.2 

2 

9.2 

11.5 

7.0 

1 

10.0 

15 

7.6 

5 

8.8 

10 

7.8 

6.5 

9.4 

13 

7.4 

4 

9.2 

11.5 

l 

38.5 (/?,) 

97.5 (/?,) 


Table 11.18. Mann-Whitney test for groups 2 and 3. 


*2 

Ranks 

*3 

Ranks 

10.4 

8 

12.0 

11.5 

7.8 

1 

10.6 

9 

9.6 

6 

13.4 

13 

9.2 

3.5 

14.6 

16 

10.0 

7 

14.0 

14.5 

8.8 

2 

14.0 

14.5 

9.4 

5 

11.0 

10 

9.2 

3.5 

12,0 

11.5 

Z 

36.0 (Rj) 

$ 

100.0 (/? 3 ) 


40 



Scanned by CamScanner 



























314 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


z = 


Uj-U, 64.0 - 32.0 

s u 9.52 


= 3.36. 


(Hi) The normal curve table (Table A) is used for finding P. 

P = 2 [0.5000 - (area from p to computed z of 3.36)] = 2 (0.5000 - 0.4996) = 0.000S. 
As P is too low, the difference between groups 2 and 3 is considered significant (P = 0.000S). 


Example 11.5.2. 


Find whether or not there is a significant difference between the strengths of kneejerk reflexes (degrees 


of arc) of the following groups. 
Gr. 1 (athletes) : 31, 

30, 

22 , 

30, 

26, 

28, 

19, 

36, 

37. 

Gr. 2 (nonathletes): 35, 

26, 

14, 

20 , 

11 , 

14, 

21 , 

31, 

27, 24, 10. 


Solution : 


The scores (Y, and Y 2 ) of the two groups are entered in the first and third columns, respectively of 
Table 11.19. 


Table 11.19. Table for nonparametric anova of kneejerk data, (/i, = 9 ; n 2 = 11) 


Athletes 


Nonathletes 


Kneejerk strengths (Y.) 


Ranks 


Kneejerk strengths (Y,) 


Ranks 


31 

30 

22 

30 

26 

28 

19 

36 

37 


16.5 

14.5 
S 

14.5 

10.5 
13 

5 

19 

20 


35 

26 

14 

20 

11 

14 

21 

31 

27 

24 


18 

10.5 
3.5 
6 

2 

3.5 
7 

16.5 

12 

9 



lb) The rank sum (/?.) and the mean rank (R) of each om.m 
scores of all the groups is also computed. ' ‘ g P e computed. The mean rank (/?) of the 


7?. = — = 


121.0 


= 13.44 ; 


p - R 2 89.0 n 

— — = - = 8 0Q • 

ih 11 » 


//, 9 

(c) The statistic H is computed using the mean ranks. 

N = n { + /, 2 = 9+ H =20. df=k- 1=2 


R = 

In, 


121 + 89.0 
9+11 


= 10.5. 


-1 = 1 . 


Scanned by CamScanner 










ANALYSIS OF VARIANCE 


313 


= 2^21 [ 9(1144 - l0 - 5 > 2 + 11(8 09 - 10 . 5 ) 2 ] = 4 . 05 . 


[ Alternatively, H may be computed directly from the rank sums (Rf 



(d) Critical % 2 values {df = 1) are quoted from Table C of Appendix. 

£.05(1) = ^-84 ; £^1) = 5.41 ; £oi(t) = ^ 64. 

As the computed II is found higher than the critical x 1 for the 0.05 level, it is considered significant. So, 
the scores of the two groups differ significantly (P < 0.05). 


11.6 TWO-WAY ANOVA 

A two-way anova is used to investigate the 
simultaneous effects of two independent 
variables or factors on a dependent variable. 
Here, given combinations of levels of two 
factors arc applied on the individuals of the 
sample. For two-way anova, a classification 
table is framed with the applied levels of one 
factor represented along its rows, and those'of 
the other represented along its columns (Tables 
11.20 and 11.22). 

rwo-way anova with replications 

A two-way anova with replications is 
worked out when every combination of the two 
tactors — one level of each — has been 
a Pplied on more than one individual. The effect 
°f each combination of independent variables is 
S lv en by the replicated observations in a group 
°* individuals. In the classification table, scores 
01 each such replicated group of observations 
a ^e included in a particular cell meant for a 
8 lv en combination of the factor levels. The 
Var iance of scores within these cells gives the 


within-cells or within-grvups variance (jr). 

Each score of the dependent variable, 
occurring in any cell of the classification tnble, 
is represented here by X Ki where the subscripts 
r, c and i stand respectively for the row and 
the column to which the cell belongs, and the 
serial position of the score within the cell. 
Thus, X 213 is the 3rd score in the cell 
belonging to row 2 and column 1. The mean 
scores of the cells, called the group means, are 
represented by X rc . Thus, X 21 is the group 
mean for the cell belonging to row 2 and 

column 1. The row means are shown as X, 
while the column means are indicated by X ; 

thus, X^ and X a are the means for row 2 and 
column 2 respectively. The grand mean of all 
the scores is represented by X. 

Partitioning of sum of squares : 

The total sum of squares (SS t ) is the sum of 
squared deviations of all scores (X (x / ) from X 
and has (N - 1) degrees of freedom where N 
stands for the total number of scores in the 



Scanned by CamScanner 




















316 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


data. The total variance sf is obtained by 
dividing 55, by its df 

clkJ . 


SS,= l(X rci -Xf = IJg,- - 
df, = N- 1 ; 


with the product of the number of rows (r) and 
the number (n) of scores per cell. Because the 
columns number c and one df is lost in usin® 
X, df c = c- 1. 


_ SS, SS, 
df, N-\‘ 


I(Z x x y 


In this two-way anova, 55, is partitioned 
into four independent and additive components, 
viz., between-rows sum of squares ( SS r ), 
between-columns sum of squares (55 ), 
interaction sum of squares (55.) and within- 
cells or within-groups sum of squares (55 fc ). 

SS, = SS r + SS C + SS, + SS nr 

The between-rows sum of squares (55 r ) is 
obtained by multiplying the sum of squared 
deviations of the row means (X r ) from the 

grand mean (A ), with the product of the 
number of column >umber(n > 

scores per cell. Because there arc r number of 
rows and one df is lost in using X as an 

estimate of the parametric mean, 55 has the dl 
of (r- 1). 

SS r = nc[l(Y r> -Y) 2 ] = _ (I V m )~ _ 

1 nc n 

df r - r ~ l ; 

where r ZX r is the sum of the scores of each 
row, Z(EAf r> ) is the sum of the squared row 
sums for r number of rows, and ZX is the 
sum of all scores in the sample. 

The between-rows variance (sj) is obtained 
by dividing 55 r by its df ; it is the variance 
due to the added effects of the independent 
variable represented along the rows of the 
table. 


nr 


(I 

N 


S5 c = nr[l(X,-Y) 2 ] = 

4f c = c - 1 ; 

where ZX c is the sum of scores of each 
column. X(L.Y J 2 is the sum of squared column 
sums for c number of columns, and ZX ivi is the 
sum of all scores in the sample. 

The between-columns variance (s 2 ) is 
obtained by dividing 55 c by its df ; it is the 
variance owing to the added effects of the 
second independent variable represented along 
the columns of the table. 


55,. 

€ 


SS, 
c — I 


dL - c - 1. 


S) = & = JSr. 


df r 


df r =r-L 


Tlie between-columns sum of squares (SS \ 
is obtained by multiplying Che sumof squSd 
deviations of the column means (Y x ) from X, 


To use in the computation of the interaction 
SS, the sum of squares between cell means 
(SS n ) is computed by multiplying the sum of 
squared deviations of cell means X from X 
with the number (n) of scores per cell. As the 
number of cells or cell means equals rc and 
one dfis lost in using X, 55„ has the df of 

(rc - 1). rc J 

SS rc = n[Z(X, e -xf\ = _ (ZKJ . 

1 n . N ’ 

d frc = * ~ 1; 

andfryy'V 5 SUm ° f SCOres of each cel1 

is the sum of squared cell sums. 

„ ’ n k h ‘ Eher ordcrs of anova, there is a 
Line' f ° f in,eraclion between the 
efft« f ei " V / riables - ei 'h« enhancing the 
due t ° .n* 16 ° ^ em ° n h* 6 dependent variable 
'be simultaneous effect of another 

lwino*,“ m t ' ° r reducin S the effect of one 

ZZ S J° ^ ° f ,he °' her «"'‘tference). 
eraction sum of squares (SS,) and interaction 


Scanned by CamScanner 










ANALYSIS OF VARIANCE 


317 


iriance (sf) are the measures of variations of 
, orcS of the dependent variable due to such 
joint effects of more than one independent 
vanflblc- 

55. = SS rc - SS C - SS r 


<jf ( = </•- lXc- l); 

, , 55/ 55/ 

'' = 4/ (r-lXc-D* 


The within-cells or residual sum of squares 
(,SSJ is that component of 55, which is left 
after partitioning off other SS components with 
known <>nt it is obtained as the sum of 
squared deviations of individual scores (X ri ) 
from the respective cell means (X„), or by 
subtracting the sum of 55 f , SS c and 55 from 
SS r As n in.' I ■ il" • • 

and one ii I I In it-cpmg rc n um ber of edi 

means unchanged, SS W has the 4f of (N-rc). 

ss w = “ Kt? = 55, - SS, - 5S f - 55, ; 

df w = N-rc = nc(n- I). 

The within-cells variance (sf) is called the 
error variance, residual variance or remainder 
variance as, unlike other variance estimates, its 
source is not known. It is a measure of the 
uncontrolled variations of individual scores due 
to random effects of sampling. 

_ SS^ SS W 
w df w N-rc* 

F ratio in two-way model I anova : 

in a model I or fixed model anova involving 
lH ° freatment variables, F ratios are computed 
t0 test the significance of sf, sf and sf, using 4 
the error term in the denominator in each 
^ase. The dfi and df 2 of each F ratio are those 
! ts numerator variance and denominator 
artance, respectively ; again, each variance has 
e saine df as the SS used in its computation. 


F r=4*’ #t = <=r-l; 

^»r 

df 2 = df w = N - rc. 



df 2 = df w - N - rc. 

F i = 4 I #, = df t = (r - l)(c - 1) ; 

nr 

4f, = 4f. = W - re. 

A computed F is considered significant, 
only if it exceeds or equals the critical F (with 
the same dt\ and df 2 as the computed F) for 
the chosen level of significance (/’ ^ a). A 
significant F r or F e denotes a significant 
change in the dependent variable due to the 
effect of the treatment variable represented 
atoog the rows or the columns, respectively. A 
significant f implies a significant interaction 
between the two treatment variables. 

In case the F ratio of a variance estimate is 
found significant, omega square is computed 
for that variance estimate to measure what 
proportion of the variance of dependent 
variable scores is related to the corresponding 
effect — row, column or interaction. . 


w 


55,-(r-1)4 . 

4 + 55, 


2 - 55 r -(c-l)4 
4 + 55, 


w 2 _ 5S,-(r-l)(c-l)s?, 

' " 4 + 55, * 

F ratio in two-way model II anova : 

In a model II or random model anova 
involving two classification variables, both 
beyond the control of the investigator, the F 
ratio is first computed as F. for testing the 
significance of the interaction effect (sf), using 
sf as the error term in the denominator. Then, 
the F ratios for sf (row effect) and sf (column 
effect) are computed with sf as the 
denominator variance. _ 


Scanned by CamScanner 









318 


STATISTICS » .«*»>< A» -SVCHOTCKS, 


F «i 

1 r 


F 


_ £ 
—" -» • 


As ,i is ordinarily lower than sf, 
latter as“ the error term yields smaller “ 
signiftcant F ratios and decreases the chances 

of type I error of inference. 

F ratio in two-way model III anova : 

In a two-way mixed model anova involving 
one treatment variable and one random 
classification variable, the denominator variance 
or error term used in computing the F ratio 
depends on the following arrangements of the 
independent variables along rows and columns. 


- > - “TJ it, ™ 

«•** * % " 

x ; thus, X,_ is the mean score of the second 

• y is the mean score ot the third 

r °* an Y ri the grand mean of all the scores, 
column. X is me 

Partitioning of sum of squares : 

The total sum of squares (55,) is computed 
as the sum of squared devtauons of all the 

scores <*„) ft»» “>e ^ Th# 

total variance (sf) is computed from 55,. 

(?.xy . 








_ 


F ratios 

rows 

columns 

r r ..2 

random 

treatment 

•V 

l r v 2 

treatment 

random 

s i 

F = i 

C 2 

treatment 

random 

V 


§ 

r c „2 

random 

treatment 

S i 

Fm &. 

r i 

any arrangement 

i 




Two-way anova without replication 

A two-way anova without replication is 
used when every combination of the two 
factors — one level of each — has been 
applied on only one individual. Thus, the effect 
of each such combination of independent 
variables is given here by a single observation 
only. 



The computed SS f is partitioned into SS r , 
SS and SS r Each of the latter is then divided 
by its df to get the corresponding variance 
estimate. 

The between- rows sum of squares (SS r ) is 
computed by multiplying the sum of squared 

deviations of the row means (X r> ) from the 

grand mean ( X ), with the number (c) of 
columns. Where IX,, is the sum of scores of 

r 

each row, I(ZX r ) 2 is the sum of squared row 
sums for r number of rows, and IX is the 

rc 

sum of all scores. 


SS r = rt(X r - X ) 2 = ^ X '- )2 

c 

df r =r- 1. 


(IX„) 

N 


The between-rows variance (j 2 ) is a 
measure of the average variation of row means 
and is computed from SS r . 


Each individual score of the dependent 
variable is represented here by X rc where the 
subscripts r and c stand respectively for the 
row and the column to which the score 
belongs. Thus, X 22 is the score belonging to 



SS, 

r-r 


SS c is the between-columns sum of squares ; 
the between-columns variance (s 2 ) measures the 
average variation of column means. Where 


Scanned by CamScanner 







ANALYSIS OF VARIANCE 


319 *' 


<■ r & the total of the squared sums of 


cO * 00 


scores for c number of columns. 


-.7 Yr— Q^r)" 

SS^rMA<-*r- r v ' 


are computed for testing the significance of j : 
and s_, us^ng s~ as the denominator variance. 


" c ~ 1 ; 


,g f _ SS„ 

#C “ C-1 


Fsi> 

*r J > 
A 


F = 

C $ 


The residual sum of squares (SS^ is that 
-oinp° DeDt of SS t which remains after 
Partitioning off SS r and SS c . The residual , 
^joinder or interaction variance (sf) includes 
jcth the variance due to individual differences 
associated wtith random sampling, and the 
variance owing to interactions, if any. between 
the independent variables. 

55, = SS r + SS c + S5.. 

55, = 55, - (SS r + SSJ ; 

<//. = (r-lXc-l); 


, 55 

s 2 = -- 

* (r-IXe-l)* 


F ratio : 

For all three models of such anova, F ratios 


^^ch F ratio is then compared with critical 
F a values having df { and df 2 identical with 
those or respectively the numerator variance 
and the denominator variance used in the 
computed F. 

Both s; and s: may be tested in this way 
for their significance in case of model 11 anova. 
irrespective of the significance or otherwise of 
the interaction effect. But F ratios, computed 
with s~ as the denominator, cannot be used for 
testing or s- in a model I anova unless it 
can be assumed that sj is free from interaction 
erects In a model 111 anova, only the variance 
estimate due to the treatment variable may be 
tested for significance by computing F with sj 
^ the denominator, unless it can be assumed 
that 5 ; is free from the interaction effect. 


Example 11.6.1. 


Apply two-way anova without replication to interpret the following data of tracheal 
ventilations (X ml per minute) of a sample of insects before and after exposure to a hailstorm. 

(a = 0.05.) 

Individuals : 1 23456789 10 

Tracheal 


ventilations : 

(a) before : 

(b) after : 


75.2 

65.2 


86.0 

79.6 


78.0 

80.0 


75.4 

70.4 


84.5 

75.5 


65.0 

74.2 


710 

68.0 


88.0 

74.0 


71.9 

59.1 


68.0 

54.0 


Solution : 

, •*. a classification or random variable 

Levels of hailstorm, i.e., before and after exposure 01 differences between the insects, resulting 
yond the control of the investigator, on the contrary, m % ^ riable . S o, a model II mo-way 

’ m random sampling, may be considered to fonn a ^tis die data are arranged in the 

ova without replication may be worked out in interpreting the datx 

>umns of Table 11.20. 


Scanned by CamScanner 






320 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Table 11.20. Classification table for two-way anova of tracheal ventilation data. 


Individuals 

Exposure to hailstorm (X r ) 

zx. 

v 

*. c 

Before 

After 

X r. 

1 

75.2 

65.2 

140.4 

70.2 

2 

86.0 

79.6 

165.6 

82.8 

3 

78.0 

80.0 

158.0 

79.0 

4 

75.4 

70.4 

145.8 

72.9 

5 

84.5 

75.5 

160.0 

80.0 

6 

65.0 

74.2 

139.2 

69.6 

7 

72.0 

68.0 

140.0 

70.0 

8 

88.0 

74.0 

162.0 

81.0 

9 

71.9 

59.1 

131.0 

65.5 

10 

68.0 

54.0 

122.0 

61.0 


764.0 

700.0 

1464.0 

73.2 


76.4 

70.0 


( X) 


(a) The sum of tracheal ventilation scores of each row (IX r ) and that of each column (IX ( .) are worked 
out in Table 11.20 and used in computing the respective row means (X r ) and column means (X f ). Thus, 

the column means, X, and X 2 , are computed using the respective IX^ values and the numbers (n) of scores 
of the corresponding columns. 


row are used in working out the row 


10 ' "•* n ~ 10 
(b) Similarly, the IX r value and the number (n) of scores of each 
mean X n of that row. For example. 


\ = ^ = ^ = 70.2; Z = Mi. = 1616 = 


2 _ 

— n 2 


82.8. 


(c) Total sample size (A0 is worked out using the numbers of rows (r) and columns (c) while all scores 
of the sample are totalled to get IX K . These are then used in computing the grand mean (*). 

N = rc = 10 x 2 = 20. X = - Xrc - 1464.0 . 

N ~ 20 _73 ’ 2 - 

(d) Total sum of squares (SS.) is worked out and then DartitinneH 

“oZTou^ “ d rcSidUal ° r imeraCUOn SUnE 0f active ^s°o7S m “o 

SS i = ^{ X ,c~ x f = (75.2 - 73.2) 2 + (86.0 - 73.2) 2 + (78.0 - 73.2) 2 + 

... + (74.0 - 73.2) 2 + (59.1 - 73.2) 2 + (54.0 - 73.2) 2 = 1400 32 
df t = N - 1 = 20 - 1 = 19. 

= d(X r -xf = 2 [(70.2 - 73.2, 2 + (82.8 - 73.2) 2 + ... + (65 . 5 . 73 . 2)2 + (6 ,. 0 . „ 2)2] 

= 946.66. 


Scanned by CamScanner 


















f 


ANALYSIS OF VARIANCE 


321 


dfr 


r _ 1 = 10 - 1 = 9. 


; e = rl(X x -xf = 10 [(76.4 - 73.2)2 + (70.0 - 73.2)2] = 204.80. 


ss t 

dfc 


sC - 1 = 2-1 = 1 . 


55( = 55, - (55 r + 55 c ) = 1400.32 - (946.66 + 204.80) = 248.86. 
d f. = (r- 1 Xc - 1) = (10 - 1X2 - 1) = 9. 

^) Between-rows (s^), between-columns (r^) and interaction (sj) variances are next computed using the 
respective sums of squares and their degrees of freedom. 


2 - — 946.66 _ 

f '”t~ 9 - 105 


18; 4 = ^ = — f 8 ° = 204.80: j? = S = 24|86 _ 27,65, 


(f) F ratios are next worked out for r; and r£. 


F= J k = 


it 105.18 


r~ ]2 ~ 27.65 


= 3.80; df .df^df^ 9,9. 


204.80 


F c = ^2 = 27.65 = 74l; 


Tabic 11.21. Anova table for tracheal ventilation data. 


Sources of variation 


Sums of squares 


Variances 


Between rows 

Between columns 
Interaction 



^ 0 . 05 ( 9 . 9 ) = 318. for comparing with 
^0.05(1,9) * 5.12, for comparing with F 

0.«f" thl " “* CnUCal '••W “ >»*»*«■» of hailstorm on 

CnUC f a ‘«• * ***« variance component due ,0 
* 'nations between the insects of the sample (P < 0.05). 


Ex omple 11.6.2. 

' K atoeni l0 ,he„ h !ft,° 8 i 0bin <8 , dL '' :i were detcnnilled “ M anemic persons before starting any 

•tamishiion The L,*“ U ° f “a adnunistrauon of ferrous sulfate, and again after 6 weeks o/such 

< *«rvtd data (tr 1 aOD P “ nrS ' ^ ° f Table 1 L22 ~ anova •» interpret the 

Solution : 

Con ^te TTT! ^. DStitUte a 116311116111 vanable and the individuals are considered to 

, A model III two-way anova without replication may be applied to the data 

41 


Scanned by CamScanner 










322 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Thblc 


Individuals 


- Classification table used fbl anova of hemoglobin data. 


Duration of treatment (X r ) 



(a) The sum of scores of each rowJIXJ and that of each column (1XJ are computed in Table u 
and used in computing the row mean (XJ, the column mean (X,.) and the grand mean (X). For each row 

ll 3 ^ k" = 1X '- /n ; for each co,umn ’ " = 10 and X, = XXjn ; for the grand mean, X = Zx m 
each score bemg represented as X K . For example, * N ' 

N = rc = 10 x 3 = 30. X = —- 336.0 _ ,, 7n 

/V " 30 " U - 2a 


^ 

n 


x .,= 


_ 90.0 


= 9.00. x. = Ml = 3L8 = 


10 A I* « - 3 

(b) SS ' = 1(X "~ X)2 = (8 ‘ 2 " H - 2 ) 2 + (7-2 - 11.2)2 + .... + (13.3 - 11.2)2 + (14.4 - 11.2)2 = 158 94 
4f t = N-l = 30 - 1 = 29. 

(O 55. = cm,. - X? - 3 100.6 - U 2 f + .... + 0 U3 - „.2« = 36.79. * = r _ , = , 0 - 1 * 9 
•7 _ SS, 36,79 _ r 


S r ~ 

r r-1 


= 4.09. 


“'• -„z■ -"**. •>-»*>.«* 

^-Try 


= 55.90. 


W SS ' = SS ' - (SS r * SS J * ‘58.94 - (36.79 + Ul . 8 0) = , 0 . 3 J. 
<*/ * <r~ *Xe- 1) = (IQ - 0(3 - 0 = 18 ; 


,l - SS, 

‘ V-IXc-lj 


_ JM 5 
18 


= 0.58. 


(f) Provided it can be assumed tHai ihurr* 

4 ^ S n ° lnteiact ‘ on between the two independent variables, 

dj : 9,18. /.’ - V 35.90 

W = 96 - 38 i a/: 2.18. 


r _ £ 4.09 

F - - 7 = 038 * 7 °5 1 


■V 


0.58 


Scanned by CamScanner 
















ANALYSIS OF VARIANCE 


323 


Table 11.23. Anova table for hemoglobin data. 


^Sources of variation 

Sums of squares 

df 

Variances 

F 

Between rows 

36.79 

9 

4.09 

7.05 

Between columns 

111.80 

2 

55.90 

96.3S 

Remainder 

10.35 

18 

0.58 


Total 

158.94 

29 




The computed F r is greater than the critical F 019 . which amounts to 3.60 (Table H). Similarly, the 
computed F c is greater than the critical F )g which amounts to 6.01 (Table H). Thus, both F r and F c are 
significant (P < 0.01). Hence, there is a significant effect of the treatment variable (column effect) as well 
as a significant added variance component due to the random effects between individuals (row effect). 


Example 11.6.3. 

Different combinations of three levels of a fixed treatment variable (A) and two levels of another 
treatment variable (11) were administered to the groups of a sample of 30 individuals to study their effects on 
a particular dependent variable. The scores of the dependent variable, measured after such treatments, are 
arranged in a two-way classification table (Tiblc 11 24) Find the significance of the effects of the treatments 
and also of their interaction effects (a = 0.01). 

Solution : 

A two-way model l anova with replications may be used Three levels of variable A are represented 
along the columns c, t r 2 , and Cj (column effect) while two levels of variable B are represented along the 
rows r, and r 2 (row effect) in Table 11.24. Each combination was replicated with 5 individuals (n = 5). 


Table 11.24. Two-way classification table for scores of the dependent variable 
at different levels of two “fixed” treatment variables. 


Variable B 


Variable A 



c i 

C 2 

C 3 


7 

10 

14 


9 

11 

17 

r i 

8 

12 

18 

6 

9 

12 


10 

13 

19 


9 

18 

26 


12 

17 

25 

r 2 

10 

21 

32 


8 

16 

26 


11 

18 

26 

n = 5. 


N = nrc = 5 x 2 x 3 = 30. 



The data are repeated in Table 11.25 for computations. 


Scanned by CamScanner 

















324 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


(o) Means are first computed for cells, rows and columns. 



— — YY 

2. Row means (X r ) : X r = - . 




3. Column met 



4. Grand mean ( X ) : X = ^ . 



(6) The sums of squares are next computed. 


1. SS r = ;tc[l(X r> - X) 2 ] =5x3 [(11.67 - 15.00) 2 + (18.33 - 15.00) 2 ] = 332.67. 
df r =r- 1 = 2 - 1 = 1 

2. SS C = nr[l(X^ - X) 2 ] = 5 x 2 [(9 - 15) 2 + (14.5 - 15) 2 + (21.5 - 15) 2 ] = 785.00. 
df c =c- 1 = 3 - 1 = 2. 

3. SS re = n [Z(X rc - X ) 2 ] = 5 [(8 - 15) 2 + (11 - 15) 2 + (16 - 15) 2 + (10 - 15) 2 + (18 - 15) 2 + (27 - 15) 2 ] 

= 1220 . 00 . 

df rc = rc - 1 = 2 x 3 - 1 = 5. 

4. SS ; = SS n - SS C - SS r = 1220.00 - 785.00 - 332.67 = 102.33. 

dfi = (r - l)(c - 1) = (2 - 1X3 - 1) = 2. 

5. 55 w = l(X Ki - X rc ) 2 = (7 r 8) 2 + (9 - 8) 2 + (8 - 8) 2 + .... + (26 - 27) 2 + (26 - 27) 2 = 110.00. 

df w = N- rc = 30 - 2 x 3 = 24. 

(c) The variance estimates are computed dividing each SS with its df. 




2 _ SS; _ 


SS r _ y: 



Scanned by CamScanner 












ANALYSIS OF VARIANCE 


325 


Table 11.25. Table for computing two-way model I anova. 


Variable B 


Variable A 


Row sums _ 

Y 

c l 

°2 

C 3 

LX, 


7 

10 

14 



9 

11 

17 


r i 

8 

12 

18 



6 

9 

12 



10 

13 

19 



40 

55 

80 

175 11.67 


8 

11 

16 



9 

18 

26 



12 

17 

25 


r 2 

10 

21 

32 



8 

16 

26 



11 

18 

26 



50 

90 

135 

275 18.33 

Xrt 

10 

18 

27 


Column sums IX 

•C 

90 

145 

215 

450 (IX,,,) 


9 

14.5 

21.5 

15.0 (X) 


(d) The F ratios are computed for testing the row effect, the column effect and the interaction effect. 



332.67 

4.58 


= 72.635 ; 



392.50 

4.58 


■ 85.699 ; 


_ sf 51.17 
f ' = 7 = T58 


11.172. 


(e) Each computed F ratio has two degrees of freedom, viz., those of its numerator and denominator 
variances. Thus, 


for F r : df r , df w = 1, 24 ; for F c : dtf c , df w = 2, 24 ; for F, : df r df w = 2, 24. 

So, F r should be compared with critical F 01(1 , 4) while F and F, should be compared with critical 
f .oi(2.24) from Ta ble H. 

Critical F m 24) = 7.82 ; F r = 72.64 ; P < 0.01. 

CriUcal F 01(W4) = 5.61 ; F c = 85.70 ; P < 0.01 ; 

F,.= 11.17; .-. P < 0.01. 

Hence, all the computed F ratios are significant (F < 0.01). So, the effects cf both treatment variables 
(row effect and column effect) as well as the effect of their interaction are significant on the dependent 

\annhiA 


Table 11.26. Anova table for the data of Table 11.24. 


Sources of variation 

Between rows 
Between columns 
Interaction 
Within cells 

Total 


Sums of squares 

332.67 
785.00 
102.33 
110.00 


1 

2 

2 

24 


332.67 

392.50 

51.17 

4.58 


72.635 

85.699 

11.172 



Scanned by CamScanner 


















326 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


•(/) The omega squares are computed for all three significant effects, using the total sum of squares ( 5 $ 
worked out in Table 11.26. 

= SS^-jc-l^ = 785-2x _L58 , 0 , 581 . 


4+ss, 


4.58 + 1330 


w 


, SS, -(r -1)^. 332.67 - 4.58 _ n , |IA . 

4+SS, " 4.58 + 1330 


wr 


, SS! -(r-l)(c-1 )sl _ 102.33-2x4.58 _ nnnn 
si + SS' " 4.58+1330 * * 

So, the row effect, the column effect and the interaction effect are related to 0.581, 0.246 and 0.070 
proportions, respectively, of the total variance of the dependent variable scores. 


GLOSSARY 

added treatment component : that component of the variance of dependent variable scores in an 
experiment which occurs between the groups of subjects or cases in addition to the random variations 
measured by the within-groups variance, and results from the effects of the controlled treatment 
variable(s) used as the independent variablc(s). 

added variance component : that component of the variance of dependent variable scores in an experiment 
which occurs between the groups of subjects or cases in addition to the random variations 

by the within-groups variance, and results from the effects of the uncontrolled classification 
vartable(s) used as the independent variable(s). 

aftcr-dcsign/a-postcriori comparison : multiple comparison test of group means chosen only after semriny 
of the data already collected in the experiment. 

a " 0Va maSL^Zof ,he H d£penden ' “ —• » «» experimen, for estimaring rhe relative 
agntrudes of a variance under tnvesttgaoon and the error variance, for drawing inferences. 

anova, fixed model or model I : anova to be worked n..r with th- a . r 

variable(s) of only the “fixed” or controlled treatment cSs “ eXpenn,em usi " 8 inde P e “ fc “ 

anova, mixed model or model III : anova to be aonlied m r 

( fixed”) treatment(s) and uncontrolled classification variable^) m 
Jn °' a, in7ep?ndent : viable. 0 ** ° Ut ** the data ° f 3 sin g^-factor experiment with a single 

anova, random model or model II r anova to be worked out with ,h h 

independent variable(s) of only the uncontrolled classification “ experimem " s,ng 

"ZZiZrZ^^ a faCt ° rial eXperim£nl ba “ d « application of 

alrMd^bMn^hosen'whdl^des^gi^ig'dieexMriment "T”' 0 ” ° f S “ Ch S™? mcans “ 

mnlr .. . 8 8 me eXperuneM “4 Pnor to the collection of data 


Scanned by CamScanner 








ANALYSIS OF VARIANCE 


327 


variance : variance of dependent variable scores, resulting from unknown and uncontrolled lacitu ^ 
<rr0f associated with random sampling, and used as the denominator of the F ratio. 

ntal design : scientific planning of an experiment by determining the required sample size, using. 
e xperirne^^ sampling, fixing the levels of application of the independent variable and the number of 
epiications of each such level, applying those levels in a random sequence, minimizing experimental 
errors from unwanted relevant variables, and allowing the subsequent application ot anova on the 
collected data for interpretation. 

t o : variance ratio in which the numerator is the variance estimate whose source and significance arc 
f ^ ' under investigation, and the denominator is the error variance resulting from unknown factors 
associated with random sampling. 

ial experiment : experiment designed to study the changes of the dependent variable on exposure to 
f3C chosen combinations of different levels of more than one independent variable. 

homweedasUdty : the assumption that the groups, drawn for an experiment, initially have homogeneous 
variances which differ only due to sampling errors 


Knukd- W«m* H smirfc computed to the K«fc*Wil»is r, indent one-way anova 

and interpreted using critical chi square values. 

i,, r i choun amount*, intaitiiiet. »m pli lw*i «r cmarin of m totol"'"- 1 ''"' v " tabte ' “ whlch 

the dependent variable is exposed in an experiment. 


m .i.i r i. comparison tort t tot to be ropbed to «<JP"* roe " 1 ' , °' 1 ° wln » 

ratio in an onov. with more dun m (nmps. to which group «**»* «« ' 


each other. 

. . Muan t statistic, computed if * model I am hto yielded .significant F ratio, to estimate the 

strength of uwctolkM betwee* die l .i rpm i trrt variable mi the dependent variable. 

,,respective method : an investigrtkm to which toe appbcrtioa of die independent variable is subsequently 
followed by the unity of the change* hi toedependent variable. 

replication : that m.mlv lb or cases of a sample, on which each level of the independent 

variable is applied to minimize experimental errors. 

retrospective method : an investigation to explore the past exposure of the subjects, showing at present 
specific changes of the dependent variable, to a chosen independent variable. 

single-factor experiment: experiment designed to study the changes of the dependent variable due to the 
*• exposure of the sample to different levels of a single independent van 

sunt of squares, between-groups : the sum of squares computed by giving the weight of^theje s P ec 
group sizes to the squared deviation of each group mean from the grand mean of a sample. 

sum of squares, total : the sum of the squared deviations of the raw scores of all groups in a samp 
its grand mean. 

, j i.tinnc f»f thr raw scores of all the groups in 

sum of squares, within-groups : the sum of the squared des lau 

a sample, from the respective group means. 

uncontrolled experiment = experiment using such independent variable^) as are beyond toe control of , to 

investigator and are consequently liable to random errors. 

u AAmA^f^tn of an independent variable represented along 

N «*riance, betv*een-columns : variance due to the added 

the columns of a classification table in a two-way anova. 


Scanned by CamScanner 


328 STATISTICS IN BIOLOGY AND PSYCHOLOGY 

variance, between-groups : variance of scores of all the groups of subjects in an experiment, computed 
from the weighted deviations of the respective group means from the grand mean ot all the groups. 

variance, between-rows : variance owing to the added effects of an independent variable represented along 
the rows of a classification table in a two-way anova. 

variance, interaction : variance due to the joint effects of more than one independent % ariable in a higher 
order of anova. 

variance, total : variance of scores of all the groups of subjects about their grand mean in an experiment. 

variance, within-cells : variance of scores within the cells of a classification table in a two-way anova, 
serving as a measure of uncontrolled variations of dependent variable scores due to random sampling. 

variance, within-groups : variance of scores of all the groups of subjects about the respective group means 
in an experiment, resulting from unknown and uncontrolled factors due to random sampling. 





Scanned by CamScanner 




STEPS IN STATISTICAL TESTS 


A significance test 


1. 


2 . 


Working out the SE of computed statistic 

For r ‘ *' “ T^T : t0T , T r r,*J*T, + 4,- 

1 

Transforming the computed statistic to a ‘standard* score 
by division with its SE 

E.g.. / - — ; | a ; f m -^1 ~ ^2 \ 

*' v* -U ■ 

l 

Working out the degrees of freedom 

l 

Comparison of the computed ‘standard* score 
(e.g., r and F) with the critical ‘standard’ 

. score, e.g., for the chosen a 

l ~+ l 


Computed score Computed score Computed score 

> critical score = critical score 

^ > f or(40^ t f = f a(4o^ 

i i 

P<oc p- a 

\ / 

Significant 


result 


< critical score 

^ < W/)l 

P> a 

l 

Result not 


TYpe 1 error ^ 
(/></ = />«) 


significant 


42 


329 


Scanned by CamScanner 











STATISTICS IN BIOLOGY .AND PSYCHOLOGY 


t test for (X, -X 2 ) of large independent groups 


I 


n 


Construction of Table following formulae 

1 

Working out of means and sums of squares (55) 

; 55, = X(X,-X,) 2 : 55, = X(X 2 -X 2 ) : 




1 


Computation of SDs and SEs of means 

Iza.-r ,) 2 lza,-r,> 3 . 

s > = j a - y 

1 

Working out and transforming (Xj — Xj) into t 

I 2 


i' 

Quoting critical / a<£ ^ for chosen fir 

1 

Comparison of computed r with critical t a{ ^ 


1 

—1- 

1 

Computed / > t a ^ 

1 

1 ~ t a(df) 

* < t a{df) 

| 

\ 

1 

P< a 

P-a 

P>a 

\ 

/ 

\ 

Significant 


No significant 

difference 

N. ^ j 

Type I error 

C P</=/>a ) 

difference 

s 


Scanned by CamScanner 











STEPS IN STATISTICAL TESTS 


331 


t test for (A | — A\) of small independent groups 


r,= 


I*. 


Construction of Table following formulae 

1 

forking out of means and sums of squares ( SS) 
IX 


= -^ i ; x 2=-^ i ; ss, = i(x,-r,) 2 ; ss 2 = nx 2 -x 2 ) 2 . 

1 

Computation of pooled SD ( 5 ) and SE of difference 
between means ) 

1 n l + n 2““ yn j /i 2 

l 

Tr mng (Y,-Y 2 ) into t score and 

working out its degrees of freedom 

Y -Y 

1 = —*-*•; df = n| + «2 - 2. 




1 


r 


Quoting critical for chosen a 

\ 

Comparison of computed t with critical t a ^ 

- 1 - 


1 


Computed t > t a ^f) 

1 

' - w> 

1 

1 < w> 

l 

▼ 

P<a 

▼ 

P-a 

P> a 

\ 

/ 

i 

Significant 


No significant 

difference 


difference 


s. 

Type I error 
(P</ = />a) 

s 


Scanned by CamScanner 










STATISTICS IN BIOLOGY AND PSYCHOLOGY 


t test for (A^ -X 2 ) of small single group 


Construction of Table following formulae 

1 

Working out of mean difference (D) and 
SD (s D ) of differences (X, -Xj) 

n ° V n-l 

1 

C imputation of SE of D and transforming 
D to t score 

1 

Quoting critical t a{df) for chosen a 

\ 

Comparison of computed t with critical t , 

a(df) 


I- 

Computed t > t am 

\ 

P< a 

\ 


t = t 


a(df) 


\ 

P = a 




Significant 

difference 




Type I error 

(/></=/>„) 




\ 

' < W/) 

i 

P> a 

i 

No significant 
difference 


Scanned by CamScanner 







STEPS IN STATISTICAL TESTS 



Lsing z score for (X ] -X 2 ) of large independent groups 


Construction of Table following formulae 

i 

Working out means and sums of squares ( SS ) 

= ^ = SS I = X<X,-V: ®2 = XW 2 -r l ) 1 . 

1 

Computation of SDs and SE% of means 


X(x,-ry 


; s 2 =J 


I u,-r,) z _ * 


I 


r r^ ; 5 y > = ifc- 


Working out and transforming (3T, - A^) to z 

n 2~ -^2 

+ Z = ^X' 


1 


Working out probability ( P) of H Q being correct 
P = 2 [0.5000 - (Area of unit normal curve from /r to computed z)]. 


I 

P</= a 

\ 

Significant difference 


l 

P>a 

i 

No significant difference 


Scanned by CamScanner 










334 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Mann-Whitney U test for (A j — A ,) 


1 . 


2 . 


3. 


4. 


5. 


6 . 


7. 


8 . 


Construction of Table for ranking 

1 

Composite ranking of X ] and A\ scores, ranks 
in ascending order 

i 

Separate rank sum (R t ) of each group 

1 

Computation of U using cither rank sum 


1 ‘ 

Working out U t and its SE (s v ) 
n - _ l*i*2("i + *2 ♦ I) 

U < 2 ‘ v “ V 12 -• 

1 

Computation of z score using either U 


S U 


1 

Working out probability (/>) of H Q being correct 
P = 2 [0.5000 - (Area of unit normal curve from /r to computed z)]. 


I 

P</ = a 

{ 

Significant difference 


i 

P>a 

I 

No sigmficant difference 



Scanned by CamScanner 







STEPS IN STATISTICAL TESTS 


335 


Chi square test for goodness of fit 


Working out f e values for k number of classes 
as per proposed distribution 

i 

Construction of Table following formulae 

1 


Entry of f Q and f e values in 
respective columns of Table 

1 

Working out </„-/,). (/„-// ^ values 

1 

Computation of x~ lts degrees of freedom 


X 2 = l {f ° /' -• 

where m = 1, 2 or 3 for respectively Mendehan, 
binomial and normal distributions. 


i 


Comparison of computed x 1 with critical 
for chosen a 



f 






X~ > X.a(df) 


X 2 ^a(df) 


X~ < X 


a(df) 


\ / 

^ - significant 

i 

No significant goodness of fit 


1 

X~ not significant 

\ 

Significant goodness of fit 


Scanned by CamScanner 




336 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Product-moment correlation using sum of products 


Construction of Table following formulae 


2 . 


3. 


Working out of means of X and Y variables 

T = X* ; 

n n 

i 

Computation of sums of squares ( SS) and 
sum of products ( SP ) of X and Y variables 

SS X = X(X-X) 2 ; SS y = I(K-F) 2 ; SP = X(X-Xj(K-F). 

i 

Working out product-moment r 

1 

Working out SE ( s^ of r and transforming 
r into i score 


s r >/n-2 : s/' d f- n - 2 - 

i 

Quoting critical t a(d{) for chosen a 

i 

Comparison of computed t with critical 


f 


l a{df) 


t - t. 


Computed t > t a(df) 

P<a } 

\ / 

Significant 

N. 


u 

W/) 

l 

1 

1 < W/) 

i 

= a 

P> a 

1 

■ 

▼ 

Not significant 


Type I error * 

( P<l=/>a ) 


Scanned by CamScanner 











STEPS IN STATISTICAL TESTS 


Product-moment correlation using raw scores 


Construction of Tublc following formulae 

I 

Computation of LX, LY, LX 2 , LY 2 and LXY 
of variables X and Y 

\ 

Working out product-moment r 


r . n _ 'ixr-ixir 

>/l<'LX 2 -<Ixnhir ! -ivi'r| 

, 

Working out SI ( r f ) of r and transforming 
r into t score 


.v 


r 




df - n - 2. 

4 


Quoting critical r a(< ^ 


i 


for chosen a 


Comparison of computed / with critical r a(</ ^ 


i 

\ 

1 

Computed t > t a ^ 

1 

t = t 

a(df) 1 < t a(df) 

l 1 

\ 

P< a 

\ 

P = 

-a P> a 


/ 

1 

Significant 


Not significant 


N, 

Type 1 error * 



(P</ = /><*) 


337 


Scanned by CamScanner 






STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Simple linear regression of Y on A' using 
sum of products and sum of squares 


Construction of Tabic using formulae 

1 

Computations of means of X and Y variables 

r.il; r-m. 

n n 

i 

Computation of sum of products (SP) of X and Y 
and sum of squares ( SS) of X 


SP = KX-DO'-F); SS X = l(X-Y) 2 . 

1 

Computation of regression coefficient 
I (X-X)(Y-Y) 

i 

Computation of a ^ and regression equation of Y on X 
a YX = Y ~ byx X ; Y = ayx + byx X. 

i 


Computation of 5 Y scores with 5 chosen X scores 
and plotting each Y against corresponding X 
to draw the regression line. 


Scanned by CamScanner 








STEPS IN STATISTICAL TESTS 


339 


Simple linear regression of Y on X using raw scores 


Construction of Table using formulae 

l 

Computation of LX, I Y, IX 2 and 1XY of X and Y 

\ 

Computation of means of X and Y 

y = il, r = il . 


n 


i 


Computation of regression coefficient 
nlXY 

” nlxMXX)*- 

l 

Computation of and regression equation 
of Y on X 

a YX ~ ^ ~ byx X Y - dyx + byx X. 

\ 

A 

Computation of 5 Y scores with 5 chosen X scores 
and plotting each Y against corresponding X 
to draw the regression line. 


Scanned by CamScanner 



340 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


KruskaJ-W allis nonparametric anova 


4. 


6 . 


7. 


Construction of Table for ranking 

I 

Composite ranking of scores/ranks of all k groups together 

in ascending order 

1 

forking out separate rank sum (R.) of each group 

i 

Computation of mean rank (ff) of each group and 
mean rank (7T) of all the groups 


' V IV 


i 


Working out statistic H using JT, and K value* 

NslKii "* «<#n)(£<".<*; 4r~k. i. 

1 

Quoting critical for chosen 

i 

Comparison of computed W with critical y 2 

A/- 


’<*<40 




Scanned by CamScanner 






MTI’.I'H IN VIAHVfP M. WM* 


341 


OfM wuy urtov § 


Construction of 'fable following formula 

1 . , 

Working out IX,, IX 2 .IX*, ZXf, ix\ .IX? 

uning raw score# of all k number of group4 

1 

Compulation of SS f from raw score* 

W * w, + /i 2 ♦ 

s \ IX? + ix? + 


i 


yv 


Computation of SS b and it* degree* of freedom 

SS< - —-^ + (IX2) 2 (I X* ) 2 (IX, + IX2 + ,„ + IXj) 2 

' "I n t N fUJb 

i 

Working out SS W and its degree'* of freedom 
SS W = SS, - ss b ‘. df w - N - k. 

I 

Working out variances and their ratio 

h 4f»- * lL’ F T 

w 

i 

Comparison of computed F with critical F fMd j f ^ ) for chosen a. 

>- 1 —-— , 


■,df b = k-\. 


f- 

with 2 groups 

r 1 — 


Computed F t F a 

I 

P <, a 

Significant difference 


Computed F < F 

1 

P > a 

No significant difference 


1 


Omega square Added variance 

for Model I anova component for 

Model II anova 


,- 

Omega square 
for Model I 
anova 

\ 


1 

with >2 groups 

J— 


1 

Added variance 
component for 
Model n anova 
/ 


Multiple comparison tests 
for chosen pairs of groups 


Scanned by CamScanner 












BIBLIOGRAPHY 


Altman DG, Practical Statistics for Medical Research, Chapman and Hall, 1991. 

Anastasi A, Psychological Testing, 6th ed., Macmillan, 1988. 

Armitage P and Berry G, Statistical Methods in Medical Research, 2nd ed.. Blackwell. 1987. 

Baumgarten TA and Jackson AS, Measurement for Evaluation in Physical Education, 2nd ed., \\ illiant C. 
Brown, 1982. 

Billingsley, Probability and Measure, Wiley, 1986. 

Bland M, An Introduction to Medical Statistics, Oxford University Press. 1987. 

Bliss C, Statistics in Biology, I-H, McGraw-Hill, 1970. 

Bock R, Multivariate Statistical Methods in Behavioral Research. McGraw-Hill, 1975. 

Bradley JV, Distribution-free Statistical Tests, Prcntice-Hall. 1968. 

Clarke HH and Clarke DH, Advanced Statistics with Applications to Physical Education. Prcntice-Hall, 1972. 
Cochran WG, Sampling Techniques, John Wiley. 1963. 

Cronbach LJ, Essentials of Psychological Testing. 3rd ed.. Harper and Row. 1970. 

D * Ama ““ R ' E ^ r "”c'"0l Psychology: Methodology. Psychophysics and Learning. 3rd cd.. McGraw-Hill. 


Dixon WJ and Massey FJ (Jr.). Introduction to Statistical Analysis, McGraw-Hill. 1951. 

Edwards AL. St atistical Methods, 3rd ed. Hok. Rinehart and Winston. 197) 

Edwards AL. Multiple Regression and the Analysis of Variance and Covariance. Freeman 1979 
Feller W, An Introduction to Probability Theory and its Applicarions. I. John Wiley. 1968. 

Ferguson GA. Statistical Analysis in Psychology and Education. 5lh ed., McGraw-Hill, 1981. 

Finney DJ, Statistical Method in Biological Assay. 3rd ed., Charles Griffin, 1978. 

Fisher RA, Statistical Methods for Research Workers. 14(h ed.. Hafner 1970 

RSher ***** ™“ f ° r ***** ^cuintral and Medico, Research, 6,h ed., 

0abrie ^,;t P To C ' d 4 U 5 r 9, f ^ S,in8 ~ of means in analysis of variance’, 

Gametl HE, Statistics in Psychology and Education. Longman, 1966 

Gibbons JD, Nonparametric Methods for Quantitative Analysis HolL R' a 

Glass GV and Stanley JC, Statisdca, Methods in Education and Psych , »T ^ 

Guenther WC. The Analysis of Variance. Prentice-Hall 1964 Pre "‘i‘*-Hall. 1970. 

Guilford IP. Psychometric Methods. Tata McGraw-Hill ,97, 

-ord , and Fruchter 3. ^r„, 6lh ... 


342 


Scanned by CamScanner 


BIBLIOGRAPHY *»* 

Hmnan El ct at (edv), Handbook of Statistics, North-Hollnnd, 198' 

Harris RJ. A Primer of Multivariate Statistics, Academic. I'175. 

Hclwig IT* 5AS Introductory Guide, Statistical Analysis System Institute. 1978. 

Hill AB. A Short Textbook of Medical Statistics, 11 th ed„ Hoddcr and Stoughton, I0H4. 

Hollander M and Wolfe DA. Nonparametric Statistical Method, John Wiley, 1973. 

Kendall MG. Rank Correlation Methods. 4th cd.. Charles Griffin. 19/0. 

Kcppel G, Design and Analysis : A Researcher's Handbook, Prcnticc-llall, 1073. 

Kerlingcr EM. Foundations of Behavioral Research, Holt, Rinehart and Winston. I *)H6, 

Lindgren BW. Basic Ideas of Statistics, Macmillan. 1075. 

Marascuilo L, Statistical Methods for Behavioral Science Research, McGraw-Hill, 1071. 

McCormick EJ and Daniel Rl. Industrial Psychology. 7th cd.. Prentice Hall of India. I‘>H4. 

McGuigun PJ. experimental Psychology : Methods of Research, 5th cd.. Prentlce-Hnll Intel national, I > J<) . 
Meredith WB, Basic Mathematical and Statistical Tables for Psychology and I i dut allot!, M< <" 1 ' 11,11 1 
Mosim on JE, EU ntents of Probability for the Biological Sciences. Appleton Century-Crofts, 1968. 

Myers JL, Fundamentals of Experimental Dr* *n. *rd cd . Allyn and Bacon. I‘>70. 

Nie Nil ct al. Statistical Package for the Social Snmcet, 2nd cd . McGraw-Hill. 10/5. 

Nunnnlly JC. Introductory Statistics for Psychology and education, McGraw-Hill. 1075. 

Nunnally JC. Psychometric Theory, 2nd cd SLt•(.»•>■ Hill l‘>78. 

Schcffe H, The Analysis of Variance, John Wiles. 1050. 

Snedecore GW and Cochran WG. Statistical Method 6th cd.. Iowa Stutc University Press, 1067. 

Sokal RR and Rohlf FJ. Biometry. Freeman. I960. 

Sokal RR and Rohlf FJ. Introduction to Biostatistics, Freeman. 1073. 

Som RK. A Manual of Sampling Techniques. Heincmann. 1073. 

Sprent P. Applied Nonparametric Statistical Methods. Chapman and Hall, 1980. 

Sprott DA. Statistical Inference in Science. Springer-Verlag. 2000. 

Thorndike RM. Correlation Procedures For Research, Gardener, 1978. 

Tufte ER. The Visual Display of Quantitative Information, Graphics Press, 1983. 

Weber JC and Lamb DR. Statistics and Research in Physical Education, Mosby, 1970. 

Wert JE. Educational Statistics, McGraw-Hill. 

Wert JE et al. Statistical Methods in Education and Psychological Research, Applcton-Century Crofts, 1954 
Wilcox RR. Fundamentals of Modern Statistical Methods, Springer-Verlag, 2000. 

Wilcoxon F and Wilcox RA. Some Rapid Approximate Statistical Procedures, American Cyanamid Co 

1964. #Bna , _ __ 

Winer BJ. Statistical Principles in Experimental Design. 2nd cd.. McGraw-Hill. 1971. 


Scanned by CamScanner 


SAMPLE QUESTIONS 


CHAPTER I 

I. Complete each of the following statements by choosing and marking with a tick (✓) the correct alternative. 

(a) Ratio scale is used to express the values of : 

(/) body temperature, (//) intelligence, (iii) femur length. 

(b) Stratified random sampling is used for drawing a sample from a population which is . 

(/) small and homogeneous, (It) large and heterogeneous, (iii) widespread and \ast. 

(c) With respect to a nominal variable, individuals of a population can be subjected to : 

(/) serial gradations into ranks, (ii) assessments of only qualitative differences, (iii) quantitati\e 
measurements. 


(d) Of the following, the only discrete variable is : 

(/) blood volume, (ii) body weight, (iii) respiratory rate. 

(e) Interval scale has to be used to express the values of : 

(i) heart rate, (ii) body temperature, (iii) cell count 

(/) Multistage sampling is used where the population is 

(/) vast and widespread, (ii) small and homogeneous, (iii) large and heterogeneous. 

(g) Of the following, the only continuous variable is : 

(/) interorbital width, (if) sex, (iii) ferocity. 

2. Match each item of Column 1 with the correct item in Column 2, mentioning the serial number of the latter 
in the space after the former. 


Column 1 

(a) ordinal variable_ 

(b) absolute measure of dispersion 

(c) discrete variable_ 

(d) statistics of location_ 

(e) prediction statistic_ 

(f) nominal variable_ 

(g) sampling statistic_ 

(h) continuous variable_ 


Column 2 

(i) median 

(ii) regression coefficient 

(iii) standard error 

(iv) coefficient of variation 

(v) litter size 
(W) body height 
(vii) variance 
(viii) sex 

(u) personality 


3. 


Fill up the blanks in the following statements with the 
the parentheses below : 


correct words chosen from among those given within 


344 


Scanned by CamScanner 










SAMPLE QUESTIONS 


345 


(a) Coefficient of variation is a_measure of dispersion while an independent variable strictly controlled 

by the investigator is a_variable. 

(< b ) An experiment studies the effects of the_variable on a_variable. 

(c) Quartile deviation is the_measure of dispersion while a_variable is an uncontrolled 

independent variable in an experiment. 

(</) Simple random sampling is applied on a_population while quota sampling is done with a_ 

population. 

(e) A summary value of the scores of a variable for a population is a_while that for a sample is a 

(f) Fixed interval sampling is based on_while judgment sampling depends on the_by the 

investigator. 

(stratified, absolute, homogeneous, treatment, relative, absolute, dependent, statistic, parameter, choice, 
classification, independent, probabilities) 

4. Mark the odd item in each of the following series with a tick (✓) mark. 

(a) : ( i ) blood volume, (it) blood group, (m) blood sugar, (iv) blood pressure. 

( b) : (/) multistage sampling, (it) stratified random sampling ; (»7i) fixed interval sampling, (iv) judgement 

sampling. 

(c) : (i) standard error, (ii) mean, (Hi) standard deviation, (iv) correlation coefficient. 

(d) : (i) standard deviation, (ii) mean deviation, (iii) percentile, (iv) range. 

5. (a) Explain what you mean by statistic and parameter. 

(b) Classify statistics, defining each class with suitable examples. 

(c) What do you mean by a point estimate and an interval estimate of a parameter ? 

6. (a) What is the difference between probability sampling and judgement sampling ? 

(b) Discuss how and when the following methods of sampling are undertaken : simple random sampling, 
multistage sampling and stratified random sampling. 

(c) What is incidental sampling ? 

7. (a) What are the basic characteristics of measurement variables ? 

(b) Classify measurement variables, describing each class briefly with examples. 

(c) Define derived variables with examples. 

8 (a) Explain what you mean by dependent, independent and relevant variables with respect to experiments. 

(b) Classify independent variables of experiments, describing each class with suitable examples. 

(c) Describe different classes of relevant variables with examples. 

9. (a) Explain the terms population and sample. 

(b) Explain why a sample should be used, instead of the entire population, for an experiment. 

(c) Write br i e ny about random samplings, respectively with and without replacement. 


A A 


Scanned by CamScanner 














346 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


CHAPTER 2 

1. Complete each of the following statements by choosing and marking with a tick (✓) the correct alternative. 


(a) Ogives are graphical representations of : 

(0 bivariate frequency distributions, (if) qualitative frequency distributions, (Hi) cumulative frequency 
distributions. 

(b) Frequency polygons are area diagrams of : 

(i) discrete frequency distributions, (if) continuous frequency distributions, (iii) qualitative frequency 
distributions. 


(c) The frequency distribution of a discrete variable may be diagrammatically represented by : 

(0 histogram, (ii) frequency polygon, (iii) bar diagram. 

(d) Frequency distributions of nominal variables can be represented by : 

(0 frequency polygon, (if) pie diagram, (iii) histogram. 

(e) True class limits are shown in the frequency distributions of : 

(/) continuous variables, (if) nominal variables, (iii) discontinuous variables. 

(/) All individuals in a class interval of a quantitative frequency distribution arc assumed to possess the 
score identical with its : 

(0 upper true limit (ii) lower true limit, (iii) midpoint. 

(g) the less-than cf of an interval of a distribution is the sum of the frequencies from the lowest interval 
to the 

(/) midpoint, (ii) upper true limit, (iii) lower true limit of the relevant interval. 

2. Match each item of Column 1 with the correct item in Column 2, putting the serial number of the latter 
in the space after the former. 


3. 


Column 1 

(a) scattergram_ 

(b) cf ogive_ 

(c) point distribution_ 

(d) histogram_ 

(e) proportional bar diagram 


Column 2 

(i) cumulative frequency 
(if) discrete variable 

(iii) cumulative percentage 

(iv) bivariate distribution 

(v) continuous variable 
(v'i) nominal variable 


Fill up the blanks in the following statements with the correct words chosen from 
the parentheses below : 


among those given within 


<°> TtK f0rm 0f aSSOcia “°" be,wee " lwo -variables is graphically indicated by their_ 

P ° ly8 ° n , ° f *-frequency distribution is obtained by plotting the frequency of each 

class interval against the_of that interval. 3 V S 6 Ire< J uenc y ot eacn 


(c) An ogive may be drawn by plotting the_ 

(d) A pie diagram loses precision if ihere are too 

a frequency polygon decreases its_. 


of each interval against its_ 

-classes in the distribution while smoothening of 


(jaggedness, scattergram, cf, few, midpoint, X u , 


measurement, many, continuous) 


Scanned by CamScanner 















SAMPLE QUESTIONS 


347 


4 . Mark the odd item in each of the following series with a tick (✓) mark. 

(a) • (0 frequency table ; 07) frequency distribution : (iii) continuous variable ; (iv) frequency polygon. 

(b) : (/) midpoint ; (if) continuous distribution ; (iii) true class limits ; (/V) point distribution. 

(c) : (0 cumulative frequencies ; (if) pie diagram ; (iii) ogive ; (iv) cumulative percentages. 

(Y/) : (/) simple bar diagram ; (if) pie diagram ; (i if) histogram ; (tv) propoitional bar diagram. 

5 . (a) Describe the working out of a frequency distribution from the raw scores of a continuous variable. 
(/;) Write how a frequency polygon is plotted from a continuous frequency distribution. 

(c) Discuss the advantages and disadvantages of using the frequency polygon. 

6 . (a) Describe the working out of the cumulative percentage up to the true upper limit of each class interval 

from the grouped data of a continuous frequency distribution. 

(/;) Mow would you draw an ogive using the cumulative percentages thus computed ? 

(c) Using the following frequency distribution of winglengths (mm) of a sample of cockroaches, work out 
the cumulative percentages upto the true upper limits of the respective class intervals. 

Class Intervals 22*25 26*29 30-33 34-37 38-41 

(score limits) 

Frequencies : 5 10 20 9 6 

7. (a) Describe how you would draw a histogram f«’r the grouped data of a frequency distribution. 

( 6 ) Mention the merits and demerits of a Imtogmm lor the graphical representation of frequency distributions. 

(c) Draw a frequency polygon using the data grim in Quezon 6 (c). How can you workout a "smoothed” 
polygon from it ? 

8 . (a) Describe the simple and multiple bar diagrams as the representations of frequency distributions. 

(b) Draw a multiple bar diagram to represent the following frequency distributions of phenotypes in two 
Drosophila samples from two habitats. 


Phenotypes : 

grey-body 

grey-body 

black-body 

black-body 

Frequencies : 

red-eye 

scarlet-eye 

red-eye 

scarlet-eye 

Sample 1 : 

90 

28 

32 

10 

Sample 2 : 

80 

35 

35 

10 


(c) Describe the drawing of a pie diagram and its use. 

9. (a) Tabulate the following bodyweight (kg) data of a sample of humans into a frequency distribution, having 
five suitable class intervals. 


57, 78. 57, 72, 68 , 68 , 56. 79, 65. 71, 74, 71, 68 , 67, 67, 70, 74, 70, 59, 62, 64, 62, 65, 68 , 

58. 77, 65, 63. 73, 65, 63. 73, 64, 66 , 64, 67, 73, 67. 


61 , 


77, 


( b ) Draw- a histogram for the representation of this frequency distribution. 

(c) Work out the distribution of cumulative frequencies upto the true upper limits of the respective class 
intervals of the above-mentioned frequency distribution. 


Scanned by CamScanner 


348 


STAT.ST.CS IN BIOLOGY AND PSYCHOLOGY 


CHAPTER 3 .. ^rh a tick < ✓ . mark the correct alternative 

, Complete every following — by choosing — ^ ’ 
given below. 

(at In a distribution which is perfectly symmemcal Wat - 

(0 median and mode ate respectively higher and iower than the mean 

(H) mean, median and mode are identical, 

(iit) both mode and median are higher than the mean. 

<b) For an incomplete frequency distribution with open class in<erval(s.. 

(0 mean catuto, be computed though n*d.an and mode can still be computed. 

(«) mean and median cannot be computed but mode can -till be wo ■ 

(Hi) mean and mode cannot be worited out though medtan can still be computed. 

(c) P |2 is a quantile which belongs to the cla^s of : 

(i) deciles, (it) quartiles, (iii) percentiles. 

.i the median biiectt die Me. of (he dtoritadon 111 hal ^ VS: 

(i) equal in area and aymmetricaL (fl) rntqml la mm and biWorily .symmetric. (../) equal m area. 

( € ) The urn of deviation! of all the score* of a uwpk from its mean anumnts to : 

(0 a positive integer, (it) zero, (///) a negative integer 

2. Fill up the blanks m the following uatcmenw cbooamg (he correct woods from amongst those within the 
parentheses below. 

. but not 


i n incomplete frequency distribu t io n s with open dam intervals, you can comp 
the_ 


( b) An asymmetric distribution causes no deflection of the_, but the maximum deflection of the_. 

(c) Below the median lie_of the scores of a sample while_of the scores lie below the first decile. 

(d) The fourth quartile of a frequency distribution is that score below which lie_of the scores of a 

sample while below the fourth percentile lies_of all the scores. 

(all, half, mean, median, mode, one-tenth, mean, one-fourth, 0.04.) 

3. (a) Discuss the properties of the mean. 

(b) Describe how you would compute the mean of the scores grouped into a continuous frequency 
distribution. 

(c) Compute the mean winglength (mm) of a sample of cockroaches using the frequency distribution given 

in Question 6 (c) of Chapter 2 . 

4. (a) Describe the properties of median. 

(b) Work out the median and the 3rd quartile of the following bodvheieht 
of humans. ; 

Class intervals : 151-155 156-160 161-165 166-170 171-175 176-180 

Frequencies : 7 16 28 27 14 g 


(cm) distribution of a sample 


Scanned by CamScanner 













SAMPLE QUESTIONS 


349 


5 . (o) Discuss Ihc properties of mode. 

<*> iTiT ,hC m0<le ° f “ d ' S,nbU " 0n wi,h un “' ul1 cU “ ira ' rval I'npta. using Uk mean 


(< j C alculuic ihc mean ami ihc mode of Ihc following distritation of achievement test scores in a sample. 

Clan intervals : 67-76 77-86 87-96 97-106 107-116 117-126 

Frequencies : 8 13 17 20 14 8 

6 . (a) Describe different types of fractiles. 


) I would you work out a percentile of a frequency distribution grouped into class intervals ? 
u 1 ,hc percen,iic rank of a score from ,he cumuh,ivc 

(d> s " 8 d|l) or a 5ampie of ras,i " s humans: 


CHAPTER 4 

L t^addog by a dek (✓, mark the corrcc, *M*, 

(«) Quartile deviation is : 

W a "" ■' " .. •«■» <* H« i mtsm . m m Mm nwasiim of di 

(b) A relative measure of dispersion : 

(0 docs not bear the unit of raw seotr\ m\ (v.« (v •>. * 

unit as that of raw scores. S ^ uaK unit of raw scores, (III) bears the same 

(0 Addition of a constant number to each score of a sample causes the standard deviation : 

(/) to be increased, (it) to remain unchanged. (»/) to be decreased. 

(</) Variance is : 

(/) a relative measure of dispersion, (fl) in the same unit as the coefficient «r • .• 

measure of dispersion. variation, (///) an absolute 

(e) Standard deviation is worked out as the : 

(0 square root of the sum of squares, (ii) square root of the mean square n/n 

of absolute deviations of the scores from the mean. ’ ' s ^ uare root of the sum 

2 - Fill up the blanks in the following statements with correct words chosen fmm ,u 

below. n0Sen from those within the parentheses 

(«) The standard deviation of a small ungrouped set of scores is worked out using the 

while that of a large set of scores may be computed using the_as the denomi^ iat ^ ^ enon ^ nat °r 


renominate 

(») Coefticient of quartile deviation is a_measure of dispersion while ranp P ; c ^ 

of dispersion. 8 15 one of the 

(t) Variance is the-central moment about the mean while the mean deviation of 

is the_central moment. scores from the mean 


. measures 


Scanned by CamScanner 




350 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Coefficient of dispersion exceeds 1 in-distributions and is a-measure of dtspemon. 

(e) Variances bear_units of raw scores while coefficients of variation bear-umts of raw scores. 

(relative, clumped, first, df, absolute, third, no, sample-size, relative, squared, second.) 

3. Mark the odd member in each of the following series by a tick (✓) mark. 

(a) : (i) sum of squares, (ii) variance, (iii) quaitile deviation, (iv) standard deviation. 

(b) : (0 central moment, (ii) range, (iii) standard deviation, (iv) variance. 

(c) : (i) coefficient of variation, (ii) quartile deviation, (Hi) root-mean-squarc, (»') mean deviation, 

(d) : (i) standard deviation, (ii) absolute measure of dispersion, (iii) range, (iv) coefficient ot dispersion. 


4. (a) What is standard deviation ? Discuss its properties. 

(b) Explain what you mean by the unbiased SD of small samples, mentioning its computational formulae. 

(c) Work out the unbiased SD and variance of the follow ing bodyweight (kg) scores ot a sample ot humans. 

57, 78, 57, 63, 73, 65, 70, 74, 70, 67, 58, 77, 65, 67. 73, 67, 72, 68, 56, 63, 73, 64. 66, 64, 62. 65, 
68, 61, 58. 

(d) Also compute the coefficients df variation and dispersion tor these scores. 

5. (a) What is quartile deviation ? Write how you would compute it using the quartilcs, mentioning the 

computation formula. 

(b) Discuss the propcitics .-i pu.irtilc deviation, mentioning to relations t.»the iktwiMit end koitosis 
a distribution. 

( c) Work out the quaitile deviation and the coefficient o! quartile devution of the bodyheight (cm) distribution 
presented in Question 4(b) of Chapter 3. 

6. (a) Define variance and coefficient of dispersion. 

(b) Work out the variance and CD of the wmglength distribution of a sample of cockroaches, presented 
in Question 6(c) of Chapter 2. 

(c) Compute the variance and the coefficient of variation of the interorbital widths (mm) of the following 
sample of pigeons. 


10.4, 13.0, 12.6, 12.5. 10.3, 11.8, 11.6, 12.4, 10.6, 12.9, 10.7, 12.0, 12.5, 11.0, 11.5, 12.2, 
11.7, 10.9, 10.6, 11.5, 11.3, 13.0, 11.7, 10.8, 11.1, 12.3. 


CHAPTER 5 

1. Complete each of the following statements by choosing and marking with a tick (✓) mark the correct 
alternative given below. 

(a) The standard error of difference between means is : 

(0 a relative measure of dispersion, (ii) a sampling statistic, (iii) an absolute measure of dispersion. 

(b) Means of samples drawn by random sampling from the same population differ from each other due 

to : 

(0 mean deviations, (ii) quartile deviations, (iii) sampling errors. 


Scanned by CamScanner 









SAMPLE QUESTIONS 


351 


(r) A Humpling attribution of means results from : 

(/) sampling errors, (//) coefficients of dispersion, (Hi) ranges. 

(</) The degrees of freedom of a statistic depend on : 

(0 sample size, (II) number of precomputed statistics used in its computation, (iff) both. 

(<•) Sampling distributions may be worked out : 

(/) both theoretically and experimentally, (if) experimentally, (iff) theoretically. 

(/■) Linear transformations of raw scores change their distribution with respect to : 

(i) kurtosis, (II) mean and SD, (Hi) skewness. 

2 . Indicate with a tick (/) mark the odd item in each of the following series. 

(') lamlurd error. 07) standard deviation, (iff) sampling ettor, (fv) sampling distribution. 

(I>): </) ; score, (II) T score, (III) c score, (fv) stonine. 

(c) : (!) statistic, (II) sampling distribution. (Ill) standard error. (iv) parameter. 

3. Fill up the blanks in the following statements bv chorKino ik. e . 

below. n> choosm 8 ,hc correct words from those within the parentheses 


and the_ 

UMng the laws of probability and 

_in the 


_ using the 
distribution. 


transformations of raw scores. 


(«) Sampling error is the difference between a 

(l>) Sampling distributions can Ik* worked out 
observed scores of randomly drawn samples " 

«•> Standard error of a statistic is a measure of its dtspersion around the 
(</) The z score is a_transformed _ _ KO rc 

I 

M .'*•“* ■-* -. «non,^, „ r , 

4. («) Define and explain the degrees of freedom of a statistic with examples 
(10 What are the sampling errors of a statistic ? Explain with an example 

<<0 Write briefly about the sampling distributions of statistics and their differences 
T k ■'' tand "d deviation of memory test scores amounted to 2 35 in » 

m another group of 42 students. Work out the SE of the different ZZ?/ 3 ° StUdents ' and to 3. i; 

5. (a) What do you mean by the standard error of a statistic ? ' mea " memorylest scores 

('■> the standard enor of the mean and the SE of difference between means 
. CSt ' lb f e menl, oning the computational formulae, how the SE of th* 

drawn from different types of populations by different sampling method “ W<>rked ° Ut for s “>Pl= 
k ° Ut ll "' SE 1,1 the mean usin 8 die following interoibital widths (mmi 

». s S K K IS - - - -"... ZZT 

' '«) '' x plain wha, is meant by the standard scores, citing examples. ' 


Scanned by CamScanner 


52 STATISTICS IN BIOLOGY AND PSYCHOLOGY 

(b) Describe how the difference between two sample means can be transformed into the standard deviate, 
mentioning the formulae. 

(c) Work out the SE of the mean of the following bodyweight (kg.) distribution in a sample of humans. 

Class intervals : 45-51 52-58 59-65 66-72 73-79 80-86 

Frequencies : 5 15 20 14 8 3 

7. (a) Write about the z score as a linearly transformed standard score. 

(b) Discuss the SE of difference between means and its computation from the sample variances. 

(c) The standard deviations of kneejerk strength scores were found to be 6.60° in 25 athletes and 5.25° 
in 16 nonathletes. Work out the SE of the difference between the means. 


CHAPTER 6 

1. Complete each of the following statements by choosing and marking witn a tick (✓) mark the correct 
alternative given below. 

(a) Normal distribution is a probability distribution based on : 

(i) Poisson equation, (ii) Gosset equation. (Hi) Gaussian equation. 

(b) Student’s t distribution is a : 

(i) theoretical probability distribution, (ii) discrete probability distribution, (Hi) experimental probability 
distribution. 

(c) The probability of random occurrence of any one of a number of alternative and mutually exclusive 
events is given by the : 

(i) multiplication theorem, (ii) binomial theorem, (iii) addition theorem. 

(d) The probability of the successive occurrence of a given number of independent events at random is given 
by the : 

(i) Gaussian theorem, (ii) multiplication theorem, (iii) binomial theorem. 

(e) The sampling distribution of the means of large samples from a non-normally distributed population 
with a finite variance is given by the : 

(i) sampling theory' of means, (ii) central limit theorem, (iii) central theorem of probability. 

( f ) The unit normal curve is : 

(i) platykurtic, (ii) leptokurtic, (iii) mesokurtic. 

(g) The Bernoulli distribution gives the probabilities of random occurrences of events of a class of a variable 
having a population distribution obeying the : 

(i) binomial distribution, (ii) t distribution, (iii) Poisson distribution. 

(/i) Student’s t distribution is : 

(0 mesokurtic, (ii) leptokurtic, (iii) platykurtic. 

2. Indicate with a tick (</) mark the odd item in each of the following series : 

(a) : (i) Student’s t distribution, (ii) binomial distribution, (iii) Poisson distribution, (iv) skewed distribution. 

(b) : (i) continuous distribution, (ii) normal distribution, (iii) r distribution, (iv) binomial distribution. 


Scanned by CamScanner 


SAMPLE QUESTIONS 


353 


<•' ftnsson distribution. (i'A unit normal curse, (iii) Student's t distribution, (iv) normal distribution. 

1 ... ^ :Nt ribution, u'A dichotomized variable distribution. (iii) Poisson distribution, (iv) 

r distribution. 

ic. oki.rtic distnbutiori. t;A r distribution, (iii) Poisson distribution, (iv) normal distribution. 

. Matwh item ot Column 1 with the correct item in Column 2. putting the serial number of the latter 
in the space after the former. 


Colmu i I 

Column 2 

small sample_ 

(0 skewness 

level of significance_ 

(if) unit normal curse 

rare es exits_ 

(in) critical z 

Bowley s coefficient_ 

(iv) binomial distribution 

Bernoulli expansion_ 

(v) t distribution 

mesokunosis_ 



W asymptotic 


(Wi) Poisson distribution 


below n lht statements w ith the correct words from those given within the parentheses 

(«) Normal distributions have-skewness while Poisson distributions possess_• skewness. 

(») TV unit normal ctnue h» its highest ontnMe at the_ and its man amounts to__ 

(C) HZ??"** ***** ***** * «■ « <«■»» ot » dichotomous variable is skewed 

he events of that class have a-proportion in the population than those of the other class. 

“"SlLSS**? C “ ** th ~ reUc * ll > "OW «« using the equation of 
distribution can be theoretically plotted using the equation of_ 

(e) The coefficient of dispersion is 
distributions. 


than 1.00 in binomial distributions and 


while the normal 
__ 1.00 in Poisson 


(0 Probabilities of events of the rare class of a dichotomous variable form a 
is identical with the_ 


. distribution whose mean 


(*) The binomial distribution of one of the classes of a dichotomous variable is c . ^ . 

events of that class haw a lower propottioo in the population than those of the^ctT H , 
skewed when the two classes have an identical proportion. " ' buI ls - 

<*) fn ' cll0nal 3153 b ^“ ) UK Rvoaail critical z a in any one of the tails of the unit no™,. 

to-the coroesponding tworitil a while that beyond the one-tail critical z am0UmS 

corresponding one-tail a. Z ° ln _the 

(negatively, \ ariance, equals, no, half. Gauss, mean, positive not less nnceaH • • 

zero, Bernoulli, equals^ ’ Gossm * P° s ^vely, higher, Poisson, 

5 ' (a) XX ’ hal 15 Poisson distribution ? Describe its properties. 

DlSCUSS ** sumptions for applying the Poisson distribution to the data. 

How do you use the Poisson distribution to find the probability of random occurrence of Y . 
of cases of the rare class of a dichotomous variable in a sample of size n drawn from / T* bcr 
known to have the proportion p of such rare cases ? m a P^^on 


45 


Scanned by CamScanner 









354 


STATISTICS IN 


BIOLOGY AND PSYCHOLOGY 


(d) Work out and interpret the probability of random occurrence of 4 Dov^m syndrome cases i n a 
of 200 humans from a population with 12 such cases on average per st u s. 


6. (a) Compare the binomial and Poisson distributions. 

(b) Discuss the assumptions undedying the use of the binomial distribution. 

(c) Work out and interpret the binomial probability of random occurrence of o fluorosis cases in a sample 
of 20 individuals drawn from a population known to have 30% incidence of fluorosis. 


7. (a) Compare the properties of normal and Student's t distributions. 

(b) Write the mathematical equation used in the theoretical computation ot normal probability distributions, 
and its modified form for plotting the unit normal curve. 

(c) Write briefly about the two-tail and one-tail critical c scores. 

8. (a) What is the unit normal curve ? Describe its principal properties. 

(b) Explain what you mean by the best-fitting normal distribution. 

(c) Work out the best-fitting normal distribution for the following observed frequency distribution of 
body weight scores (kg) of a sample of 120 humans 

Class intervals : 41-47 48-54 55-GI 62-68 69-75 76-82 83-89 

Frequencies : 5 15 25 43 21 10 l 

9. (a) Describe the principal properties of Student's t distributions. 


(/>) I Api.un why / distributions are applicable lo both *—n and large samples, 
(c) What are critical / scores 7 


10. (a) Describe the properties of skewed dtstnbuuom 

(b) Mention the formulae for working out different measures of skewness. 

(c) Work oui Pearson's second coefficient of skewness of an observed distribution of interorbital widths 
of a sample of pigeons, having the following statistics. 

Mean : 12.0 mm ; Median : 13.6 mm ; SD : 1.70 mm. 

11. (u) Describe different types of kuitosis of distributions with examples. 

(b) Write briefly about the measures of kurtosis with their computational formulae. 

(c) Work ou> and interpret the percentile coefficient oftamosis and Bowle/s quartile coefficient of skewness 
of a frequency drsinbuuon of achievement res, scores, having Ure Mowing Xs oTns £2 

P,0 9S.S; 101.0; ^ = 118.8 ; /> 75 = 119.7 ; 

12. (a) Explain what are confidence intervals and fiducial probabilities 

(1>) Describe with computational formulae the working out of the „ 

and t scores respectively. nfidence intervals for means, using z 

(c) The mean and SD of winglength scores of 12 houseflies u/*r» r ^ 

respectively. Find the confidence limits of the population m l° Und 10 ** 4 8 111111 and 0 74 m 
using r scores. Populauon mean at 0.95 and 0.99 fiducial probability, 

(d) The mean and SD of numerical operation test scores nf ai\ • i , 

respectively. Find the 95% confidence limits of the populatin' Werefound t0 be 375 ^ 6 ‘ 

P°P tion mean, using z scores. 


Scanned by CamScanner 




SAMPLE QUESTIONS 


355 


C and mark with a tick (✓) mark the correct alternative, given below, for completing each of the 
* following statements. 

( a) probability of type I error of inference equals that fractional area of the H 0 distributions which equals. 
(/) the rejection region of the H 0 distribution, («) the acceptance region of the H 0 distribution. (iii) the 
area of overlap of the H a and the H 0 distributions. 

(b) The null hypothesis proposes in any experiment that the observed results have occurred merely due 
to f 

(/) an association between the dependent and independent variables. ( 11 ) chances of random sampling 
depending on the laws of probability, (iii) reasons other than these. 


(c) The t tests can be applied if the dependent variable in the experiment happens to be : 
(i) a discrete variable, (if) a nominal variable, (iii) a continuous variable. 


(d) Probability of type JI error of inference equals : 

(/) the rejection region of the // 0 distribution, ( 11 ) the area of overlap of the H a distribution with the 
acceptance region of the // 0 distribution, (»„•) the acceptance region of the Hq distribution. 

W t tests can be used fot die sample! drawn from a population where the dependent variable is normally 
distributed, if the samples arc : 

(i) small, (it) large, (Hi) cither small or large 


(0 The unit normal curve and scores can be uied for finding the tignificn | Mfeto , r[1 two 
group means, provided : 


(/) the groups are small and the variable is normally distributed, (11) the groups are large and the variable 
is normally distributed, (iii) the groups are large and the dependent variable is a discrete variable. 


2 ““ht ^alrTfZr Wi ' h ** *""* “ C ° IUmn 1 <■* serial number of the latter 

Column 2 

(0 significant difference 
(**) wrongful rejection of H 0 
(iii) random sampling 
(rv) critical z score 
(v) small single-gronp 
(vQ large single-group 

— (vri) wrongful acceptance of H 0 

(viii) significantly higher/lower 

3 (**) small independent groups 

in thC f0 “° WiD8 StatementS wiUl lhe correcI words cbosen from those given within the 

| ^ bab ‘ l,ty of ‘W* 1 error °f inference would_with the_in the level of significance. 

1 - null hypothesis is-if the computed z score falls within the_region of the // 0 distribution 


Column 1 

(o) rejection region _ 

(b) one-tai> test_ 

( c ) type II errot_ 

W) null hypothesis_ 

pooled SD _ 

W type I error_ 

difference method 
^ two-tail test 


Scanned by CamScanner 














356 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


(c) Whether or not one group mean is significantly lower than the other is explored by a_^. 

a __test is used to find if one group mean is significantly different from the other. 

(d) Type II error of inference consists of a wrongful_of the Hq while type I error consists of a Wion 

-of the H q . “ ^ 

(e) The alternative hypothesis is_if the computed z score falls within the_region of the u 

distribution. ”o 

(/) Probability of type n error of inference would_with the_in the level of significance 

(K) For a test for a significant difference between two means,_of the critical region lies in one 

t e tails of the H () distribution while in a test for one mean being significantly higher than tho M k° 

one of the tails canies_of the critical region °' her - 

(fall, rejection, whole, retained, rise, two-lait. half, accepted, rise, acceptance, rise, rejection one i -i 
rejected, acceptance) °nt.-tai|, 

4. (a) Discuss the assumptions underlying the Student's I lest. 

(b) Explain what is meant by matched-pair groups. 

.... 

; 1 " ds ‘>•" mean 

" mw " 

. . .. rs score « f ^ 10 . ' . . 

.. . IM . — —»« 

•assas. 

(c) Work out an appropriate t test to find whethfr . 

“ ,ha " in U,e fira *“ * sensory-mot Jr « ?££££ ? “ higher in the suond 

Individual j 2 ^ ' lhc follow,n g group of 10 subjects. 

Test scores: 4 5 6 7 8 9 

first trial : 50 62 ^ 

second trial: 62 65 54 53 4 J 28 37 35 

6. («) What is the level of significance ? Discos its , “ 48 44 

(*> Wta * 3,5 ,he assumptions for using the - sc " ^ ^ ^ 0f W ««nce. 

(c) The mean and SD of birihweights were re ^ ” SlgnmC1U>Ce teas ? 

^ fW 115 ® n,4,onl infants. 

r <„, :n;r igh ' is 

- - c:r:r d -" ra ^ 

(b) Describe how Student's r sr • bct9f ^a means. ^P^esis, and how they are u 

° f -- 

- -? experiment. canc e of difference between m«.- 


10 

46 

63 


Scanned by CamScanner 













SAMPLE QUESTIONS 


357 


(c) Apply / test to find whether or not there k * - c 

or .He lowing of 10 ^ — *P -ngO* 

Individuals : 1 2 ^ ^ 


Grip strengths 
resting 
fatigued 


10 

4 


12 

6 


14 

7 


9 

5 


11 

8 


13 

11 


10 

5 


8 

4 


12 

9 


10 

15 

8 


8. («) Give an account of the assumptions for r test. 

of difference betwefn th e C mTa P ns h °, W y ° U W ° uld WOrk out the ' test for significance 
size independent groups. g unequal-size independent groups and (ii) small unequal- 

(c) Use t test to find whether or 

higher than that of the nonathlete* ^ Strength of the followin § 8 rou P of athletes is significantly 


Athletes 
Nonathletes : 


10 , 

6 , 


12 , 

4, 


9, 

3, 


16. 

7, 


11 , 

6 , 


14, 

5, 


15, 

8 , 


13, 

4, 


11 . 

9. 


12, 14. 


* J t o, 4, 9 

. (a) Describe (he importance of (he df of / in 

for unequal-size and equal-size independent groups and" lfican . ce ’ 1and how ,he d/may be worked out 
(*> Compare mentioning the formulae fiTwJ ^ ^ CXP ~- 

dependent variable scores of small uncquTl-sUc LT* °' d,fFerence betwce n means from the 
experiments. ‘ independent groups and of large single-group 

(c) Apply an appropriate t test to find whether or no, ,h 

inleroibil.il widths (mm> of ,he following groups of '^le be ' Ween ,he mea " 

f MS| “ : U ' 3 ' >«• <1.9. ,3.0. 13.4 u.g 127 129 133 ,,, 

Females : 10.5. 10.0. ,0.4. „.o. ,0.9. , 0 . 7 . „. 3 , i 0 . 8 , ££ 


CHAPTER 8 


^ iKm ° f C ° IUm ” 2 “* <>“• the — -her of .He ,a„er 


Column 1 

(a) multiple regression_ (/) 

Phi coefficient_ (/a 

^ Spearman’s rho_ (///) 

multiple correlation_ (/v) 

(e} slo P e of regression line_ (v) 

^ model I regression_ (v/) 

biserial r - (W0 

f . j m ° del n regression_ (viii) 

1 P r °duct-moment r _ (ix) 

(x) 


Column 2 

average ranks 

Fisher’s z transformation 

regression coefficient 

sum of quotients 

genuinely dichotomous variables 

classification variable 

artificially dichotomized variable 

treatment variable 

partial regression coefficients. 

variance ratio. 


Scanned by CamScanner 













358 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


2. Complete each of the following statements by choosing and marking by a tick (✓) mark the correct alternative 
out of those given below’. 

(a) Variables with more than two classes are correlated by : 

(0 tetrachoric r, (if) biserial r, (iii) phi coefficient. (nr) contingency coefficient. 

(b) If the parametric correlation coefficient amounts to + 0.37, the sampling distribution of product-moment 
r values has : 

(0 no skewness, (it) negative skewness, (iii) positive skewness, (nr) either (ii) or (iii). 

(o Two ordinal variables are correlated by : 

(7) tetrachoric r, (ii) Kendall’s tau. (iii) phi coefficient, (iv) contingency coefficient. 

(d) Linear regression can be worked out only if both dependent and independent variables have : 

(0 discrete distributions, (ii) significant linear correlation, (iii) continuous distributions, (iv) both (ii) 
and (iii). 


(e) A normally distributed continuous variable is artificially dichotomized if in the sample its scores arc: 
(i) too many, (ii) non-normally distributed, (iii) distributed in a truncated form, (iv) either (ii) or (iii). 

(J) A genuinely dkiiototnous variable is cooeiMd with a continuous measurement vtriablt by i 
(0 point-biserial r, (ii) biserial r. (iii) phi oodBocni, (iv) Spearman’s rho. 

3. nil up the blanks In each of the foUowiof Mienan by coma words ch. ..-r. ti..m those given within 
the parentheses below. 

llu thc l ’ ,ra l ,, ^ ,B,| IM®«oo'^lonendBdentn»y be tested by mmriting it either 

to_score or to_ratio. 

(b) The dope Of the regression line Of 1 C notion OB one 0# the predictors is given by a 

coefficient when the other predictors axe paniiiHfd om - 

<C> Pterion scores from the predscred cmcnon score in a regression are estimated 

(<0 Phi coefficient cmrelaies two -variables while Kendall’s tau correlates two_variables. 

(e) Product-moment r has_skewed sanmline distributions if »k„ 

while its sampling distributions are_skewed if p has a zlro'vXT^ ” “ P ° Si ‘ iVe V ‘ , " K ’ 


U) Giving average ranks to tied scores is a source of error for iwo correlation statistics, 


and 


(rho, regression, negatively, SE , variance, not. partiaL or dinal t tau Hiohot^ 

F ’ ortuna L t, tau, dichotomous, continuous, estimate) 

4. Mark the odd item in each of the following series with a tick (✓) mark. 

(a) : (0 biserial r, (ii) Spearman’s rho, (iii) phi coefficient, (iv) point biserial r. 

(b) . (i) multiple correlation, (ii) product-moment correlation. (iii\ 

. urreianon, (tit) Kendall s tau, (iv) Spearman’s rho. 

(c) . 0) partial correlation, (ii) multiple correlation, (Hi) multiple recession z- v • , 

(A\. t-\ u t-\ , .. /, p regression, (zv) simple regression. 

• (0 b )1( . (it) b l2 y (a,) (iv) 

5. (a) Discuss the assumptions underlying the product-moment r 

(b) Give different formulae for working out product-moment rresnectivelv 

covanance and sums of squares of ungrouped data How Ho y m raw scores, sum of products. 

ao > ou te st Its Significance ? 


Scanned by CamScanner 














SAMPLE QUESTIONS 


W) 


vVork out ihe product-moment r between the tracheal ventilation .score* (ml/minj un«l Hi. ‘> >" '• 
^ consumption scores (ml/min) of the following sample of locusts, and test its ilgnillcaitce 


Individual 
Ventilation 
O, consumption 


1 

85 

4.0 


2 

80 

3.4 


3 

70 

2.5 


4 

68 

2.7 


5 

68 

2.5 


6 

75 

3.2 


7 

70 

3.0 


8 

60 

2.5 


9 

71 

3.0 


10 

73 

3.2 


f, (a ) What are the assumptions for Spearman’s rank-difference correlation coefficient ' 

(b) Describe, mentioning the computational formulae, the working out of Spearman s rho and the t. 'ting 
of its significance. 

(c) Work out Spearman’s rho with the data of the preceding Q.5(c) and test its significance. 


7. (a) What is partial correlation ? Describe its assumptions. 

(b) Describe the computation of the first-order partial r between variables Aj and Xj, partialling out variable 
X 3 , and the testing of significance of the computed partial r. 

(c) Use the following data from a sample of 53 humans to find whether or not there is a significant partial 
linear correlation between glomerular filtration rate (Xj ml/min) and glomerular blood pressure (X 2 mm 
Hg) when the effect of plasma protein osmotic pressure (X 3 mm Hg) is partialled out. 

r, 2 = +0.68 ; r )3 = -0.32 ; r 23 = +0.18. 

Also find if there is a significant multiple linear correlation between X| and the combination of X 2 and 
X 3 in this case. 

8. (a) Describe the assumptions for multiple linear correlation. 


(b) Describe, mentioning computational formulae, how you work out the multiple linear correlation between 
a criterion (A,) and the weighted sum of two predictors (X 2 and X 3 ), and test its significance using 
critical t and critical F values. 


(c) Using the following data from a sample of 33 students, find whether or not there is a significant first- 
order partial correlation between their mathematical aptitude test scores (X,) and abstract reasoning test 
scores (A,), partialling out their numerical test scores (X 3 ). 

r l 2 = +0.53 ; r i 3 = +0.22 ; r 23 =+0.13. 

Also find it there is a significant multiple linear correlation between X, and the combination of A, and 
X 3 , using critical F values. 2 

9- (a) Describe the assumptions for Kendall’s tau. 

(b) How do you work out Kendall’s tau for correlating the scores of two continuous variables, and find 
its significance ? 


(c) Work out Kendall’s tau between the gill weights (mg) and body weights (g) of a sample of 10 crabs, 

given in Example 8.2.4. (page 146) to find whether or not there is a significant correlation between 
them. 

Describe the properties of product-moment r. 

[b) Fish d ° y ° U tCSt thC significance of the com P uted r > by transforming it respectively to Student’s t and 

) ^°rk out product-moment r between the vocabulary test scores of the following students and the marks 
‘Uned by them in English in a school examination, and test its significance. 


Scanned by CamScanner 




360 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


162. 

140. 


204. 232. 

150. 122, 


174. 

130. 


Student :1 2345 6 7 8 9 l0 

Vocabulary score: 12 36 10 25 34 23 9 30 11 30 

English marks : 37 59 31 50 60 49 28 54 45 58 

11. (a) Why would you work out point biserial r ? What are the assumptions for point biseriul r ? 

(b) Describe mentioning formulae how to work out r^ and test its significance. 

(c) Work out a suitable correlation coefficient with the following pulmonary tidal volume scores (ml) gf 
10 males and 10 females to find whether or not there is a significant correlation between sex and tid;i| 
volume. Justify your choice of the correlation coefficient. 

Males : 520, 492. 456. 525. 515, 550. 490. 520. 545, 500. 

Females : 460. 476, 440, 410, 404. 385. 370. 402. 375, 355. 

12. (a) In what cases is a continuous measurement variable dichotomized for linear correlation ? 

(b) Describe with computational formulae the computation of biserial r and the test of its significance I low 
are r b and r pbl interconverted to one another ? U0W 

(C) and^M^" ° f " ot a s " linw between systolic blood pressure (mm Ha) 

»... . . 

Diabetics : 244. 158. 140. 20$. 180. 216. 

Nondiabctics : 120. 142. 132. 138. 160. Its! 

13. (a) Write about different models of regression with examples. 

(b) Discuss the assumptions for simple linear regression 

tracheal Jemi iltion ^rcustL^ 0 * WOfk 001 "fression equation of 0 2 consumption on 

14. (a) Describe the properties of simple linear regression. 

(b) Write mentioning computational formulae how vou wn..u i 

>’on variable X . using respectively the raw scores the sum f ^ i ° U ' ‘ lmCar rc B rcssio » of vuriablc 

moment r. SCOrcs * Sum of P^ucts, the covariance, and the product- 

(c) Use the data of the preceding Q. 10(c) to commit* th* i: 

English on the vocabulary scores of students. r ' eresi ' on of *l>e examination marks in 

15. (o) What is multiple linear regression ? Discuss its assumptions. 

(b) Describe mentioning formulae how vou commit* .h* .. , 

given values of predictors X, and X,. and the SE of ” d“ a« ofTh of a c "««» X, on the 

(c) Use the data of the preceding Q.7(c) to work m.t ,h v * ^ ' C,ed Va,ucs of ,hc criterion, 

glomerular blood pressure and plasma protein 0 smt*c “f 8'omerular filtration rare on 

determination and the SB of esdmate. PreSSUrc - Al “ compute the coefficient .“multiple 

16. (a) What is the purpose of using the contingency coeffir' o 

“ 8 ^ 0CfnC,e " , ? C — on .he assumptions needed for 

<*> DeSCnbe ,he contputadon of contingency coefficient and ,1 
(c) Work out the condngency coefficient to find if there "■ “ , ' 1 f or Its significance. 

“ * H— bek^en the --- 


Scanned by CamScanner 


SAMPLE QUESTIONS 


361 


classes 


of students and the undergraduate courses of their enrolment, using the following distribution 
ents from different socioeconomic clatcp* in a* 


Courses 


Socioeconomic classes 


Total 

1 

2 

3 

4 

Humanities 

10 

72 

95 

15 

192 

Science 

25 

80 

125 

95 

325 

Commerce 

22 

40 

86 

55 

203 

Total 

57 

192 

306 

165 

720 


• wv/l I VIUUVII^ ’’ l U I vAullipivO« 

(b) Discuss the assumptions underlying the use of biserial r. 

(c) Describe the computation and the test for significance of biserial r. 

(d) Find whether or not there is a significant correlation between two given IQ groups of students and their 
memory test scores, using the following data and justifying your choice of the correlation coefficient. 

mnru fort 


Memory test 
score ranges 

No. of students : 
with IQ < 120 : 
with IQ £ 120 : 


- .. . . w. • ^ V/U IT V. 

dichotomous variable. 


40-44 45-49 50-54 


15 

2 


22 

4 


28 

8 


55-59 

60-64 

65-69 

70-74 

14 

II 

8 

2 

25 

32 

17 

12 

ial r with those of 

biserinl r. 



(0 How can you convert r„ into r phj and vice versa ? 

.. 


Memory test 
score ranges 

No. of females: 
No. of males : 


41-45 

3 


46-50 

3 

4 


CHAPTER 9 


51-55 

56-60 

61-65 

66-70 

9 

22 

35 

20 

12 

25 

30 

16 

with a 

tick (✓) 

mark. 



71-75 

8 

11 


(<I) ^ Yates ’ correction ; (if) chi square ; (iii) rank sum ; (iv) G test 

(b) • (') goodness of fit ; (//) contingency table ; (iii) association ; (iv) independence. 

J (') analysis of frequencies ; (//) goodness of fit ; (iii) independence ; (iv) signed rank. 

i Ma| ^ C ° mp0Site rank test ; (,7) chi S£ l uare test : («0 Mann-Whitney test ; (iv) median test. 

in >he snt h T 0t Column 1 with the correct item in Column 2 and put the serial number of the latter 
P J <-e after the former. 



Scanned by CamScanner 









362 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Column 1 

{a) signed ranks_ 

(b ) Yates’ correction . 

(c) composite ranks _ 

(d) contingency table 

(e) statistic G _ 


Column 2 
(t) association 
(ii) t test 
(iji) chi square 

(iv) paired observations 

(v) independent groups 
(W) log-likelihood statistic 


(v») log-likelihood ra..a statistic 

3. Fill up the blanks in each of the following'statements with the correct words chosen from those within 
the parentheses below. n 

(a) Whether or not an observed distribution conforms significantly to a proposed distribution may be tested 

(b) The Mann-Whitney test is a nonparametric alternative to the t test for_groups of_sizes. 

(C) bv L .hc gn,f,ta '!, betw eca aorc d ap two gro up boms can be tested aonpvutMtrtotlly 

y - 1 ’ b scd on ***• H 0 Propowng a common _ of all the groups. 

van.ih'li's <|U 1,1 U ’ " r -the of •-tabic showing the assoc. u „ botwou 

' ' to* U and 'l“ri," “ ** "'* TOUk 10 ' napocUvely 

W ~ byw -«-P-^V.-4>u_ by o,5fo r v„.,. H 

0>) The G test for goodness of fit n^lc ■« \\i 

there are-than two classes in the ^ "*** Sl2C - 200 

(exceeds, contingency, independent, more, decreased. «,mi r . 

independence, unequal, composite, median) q ai, o, median, signed, increased, chi-square, 

4. Complete each of the following statements by choosine and marirt • u 

alternative given below. 8 d markjn 8 with a tick (✓) mark the correct 

«0 Whether or no. an observed distribution confomts to a norma) dtstrihn, 

(0 chi square test for goodness of fit (ii) n,; c butl0n can be toted by : 

Of flt. (iv) eiUter („ J m * » ^ **» “» ^dependence. <„ 0 C lesl for ^ 

(b) In two-tail composite rank tests for small unequal si 7 . 

significantly if the smaller sum of ranks : mp,e ’ the sa mple means do not differ 

(0 falls between the critical upper and lower T value, nn 

lower than the cntica] lower T value, (iv, equals the critic “w r' CriUcal T «« “ 

(c) In a two-tail rank sum test for large equal-size am., v 

sum of ranks : c P S- tbe group means differ tti a..: n 

/A . t Qmer s >gnificantly if the smaller 

(/) exceeds the critical T a computed from T an a 

C° iS l0Wer ,han lhe °““ l •— r value ’J «**■> computed from f. 

( d) For samples not below 8 in size, the Mann-Whitnev /; CnUCal “ PPCr T va,ut of uWt 

« ! he «™ f ranks, («, the smaller sum of ^ : 

rank sums. ranks « («0 any of the tun i 

** two rank sums, (iv) both the 


Scanned by CamScanner 








SAMPLE QUESTIONS 


363 


A fourfold contingency table is used for computing chi square if : . 

(e) variable in a test for goodness of fit is divided into four classes, (u) the variables in a test of 


5. («) 


(f) the variable in a test for goodness- _ . .. , 

independence are each divided into two classes, (iii) the variable in a test for goodness ot ht is di\ 1 <- 
into two classes, (iv) the variables in a test of independence are each divided into four classes. 

Describe the working out of a nonparametric statistic proposed by Wilcoxon for the significance ot 


difference between the unpaired observations of two equal-size independent groups. W hat are the sources 
of its inaccuracies ? 

* ( b ) Elaborate on how you would test the significance of such computed statistic in case ot (i) small unequal- 
size groups and («*) large equal-size groups. 

(c) Apply an appropriate nonparametric test to find whether or not the mean winglengths (mm) of the 
following two groups of cockroaches difFer significantly. Justify your choice of the test. 

Group I : 28, 34, 37, 35, 42, 29, 38, 46, 40, 39. 

Group U: 24, 20, 23, 22, 28, 20, 21, 23, 21, 28. 

6. (a) Describe the computation and interpretation of a powerful nonparametric statistic preferable lor testing 
the significance of difference between unpaired observations of two unequal-size independent groups, 
mentioning the differences in the method according to the group sizes. 

(b) Work out and interpret an appropriate nonparametric statistic for finding any significant difference 
between the mean grip strengths (kg) of the following two independent groups of athletes, justifying 
your choice of the statistic. 

Group I : 10, 12, 14, 9. 11. 13, 10, 8. 12, 

Group II: 14, 6, 7, 5, 8. 11, 5, 4, 9, 

(c) Use the same data to work out and interpret an appropriate Wilcoxon test. 

7. «) Name and describe a nonparametric test of Wilcoxon for the significance of difference between paired 
samp™size S ^ S ‘" 8 gr ° Up ex P erime,lts ’ mentioning the difference in the method according to the 

(b) Work out the afore-said statistic to find whether or not the mean memory test scores, before and after 
practice, differ significantly in the following group of subjects. 


15, 

8 . 


14. 


Subject 

Before 

After 

Subject 

Before 

After 


1 

21 

26 

17 

11 

32 


2 

18 

39 

18 

35 

24 


3 

17 

35 

19 

19 

37 


4 

19 
38 

20 
28 
38 


5 

35 

28 

21 

23 

27 


6 

32 

24 

22 

18 

22 


7 

13 

30 

23 

30 

32 


8 

12 

28 

24 

29 

16 


9 

27 

19 

25 

23 

39 


10 

11 

12 

13 

14 

15 

16 

30 

20 

23 

20 

8 

31 

16 

37 

25 

32 

20 

13 

31 

33 

26 

27 

28 





29 

32 

30 





35 

39 

32. 






|C> tofiid ff Iher’ 5 C0mp0 f e ra ?m eSt 10 f0 “ 0Wing Winglen « ,hs (mm) of two groups of houseflies 
0 nnd if there is a significant difference between the group means. 

Group I ; 4 .0 4.8, 3.9, 5.3, 5.4, 5.5 4.7, 5.4. 

Grou P II : 3.2, 3.6, 3.9, 3.3, 3.9, 4.0, 3.5. 

f/ j Descn be where and how you would work out the median test and interpret the result, 

hen would you use the fourfold contingency table for this test. 


Scanned by CamScanner 



364 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


(c) Apply the median test to the following tracheal ventilation scores (ml/min) of three samples ot beetles, 
exposed to three different levels of a pesticide, to find if there are significant differences between the 
scores of different groups. 

Group I : 80.6, 85.0. 82.7, 78.7, 84.6, 85.0. 87.0. S6.4. 80.7. 

Group II : 71.7, 77.2, 70.3. 69.2, 67J, 70.1, 67.8, 73.7. 

Group III : 58.5, 65.0, 59.1. 56.7, 57.6, 59.0. 61.2, 59.6, 62.7. 

9. ( a ) What are nonparamctric statistics ? Compare them with parametric statistics, with examples. 

(I>) Why is the nonpararnetnc chi square test considered as Ml analysis of frequencies Mention its btstc 
formula and properties. 

(c) Describe the probability distributions of chi squares. Hoar do you determine the 

the tests of of fit with nnmt| . —t iiwAiwn phenotype distributions, respectn 

(d) Use the chi square test to find whether or not there n a significant goodne" ot fit between the following 
phenotype distribution of Drosophila and the Mcndclian 9 3 3 1 distribution. 

Phenotypes : Grcy-bodicd Grcy-bodicd Black-bodied Black-bodicd 

rcd-cycd (AB) scarlet-eyed (Ab) red-eyed (aB) scarlet-eyed (ab) 

No. of flics: 105 32 37 18 

111 1 1 1 111 ' 80 llttsfs ''i 'in animal species having a population male : female ran 
the litter-wise frequency distribution of ft male* founJ : ■ be a* folio.s. Use tin* chi square test 
to find whether or not the observed Uoerwise dfaaribuDoa of females has • significant goodness of in 
with the binomial distribution expected from the gisct ;• fxiUsma \ ratio. 

No. of females per litter: 0 1 2 3 4 5 

No. of litters : 2 IS 26 22 12 3 

(/'i Wlutt i s Yates' correction ? Describe when and how you would apply k in a chi square test l iness 

of fit. 

(c) Apply chi square test to find whetha s a s ig n i fic a nt d ffl ere n ce between the f rved 

frequency distribution I body heights (cm) of90 humans and the best-fitting non mon 

(/-,)• 

Classes : 146-150 

So ' 4 

fe : 4 -3 

11. (n) Explain what is meant by the chi square test of independence. 

(6) Describe, mentioning formulae, how you would frame a contingency table and use it for the chi souare 
test of independence. 

(c) Describe the use of a fourfold contingency table for computing chi square in an alternative way. without 

working out the expected frequencies of cells. J ' 

(d) Out of 60 subjects passing in a psychological test, respectively 30, 17 and 13 individuals were rated 
subsequently as above-average, average and below-average in laboratory performance But out of 30 
subjects failing in the psychological test, 4, 11 and 15 individuals got respectively the above-average, 
average and below-average ratings in laboratory’ performance. Use chi square test to find whether or 
not laboratory performance is independent of the psychological test results 


151-155 156-160 161-165 

12 12 24 

9.9 17.1 21.4 


166-170 

171-175 

176-180. 

20 

15 

3 

19.0 

12.3 

6.0 


Scanned by CamScanner 









SAMPLE QUESTIONS 


365 


12, (a) Describe the working out of the G test for goodness of fit, mentioning the computational formula and 
the determination of the df according to the nature of the proposed distribution. 

(i b) When and how you would apply Yates’ correction in the G test ? 

(c) Use the G test to find whether or not there is a significant goodness of fit between the following phenotype 
distribution of Drosophila and the Mendelian 9:3:3:1 distribution. 

Phenotypes : Grey-red (AB) Grey-scarlet (Ab) Black-red (aB) Black-scarlet (ab) 

No. of flies : 118 38 36 16 

13. (a) What is a G test of independence ? Describe how it is worked out mentioning the computational formula 
and the df. 

* 

(' b ) When and how you would use Yates’ correction in such a test ? 

(c) Solve the psychological problem presented in the preceding Q.l 1 (</). using the G test instead of the 
chi square test. 


14. (a) Out of 80 diabetics, 32 were found suffering from hypercholesterolemia while the rest had normal serum 
cholesterol. Out of 70 nondiabctics. only 14 were hypercholcstcrolcmic. Use chi square test or 
independence to find if there is any significant association between diubetes and hypercholesterolemia. 


(h) Out of 45 hypertensive humans, 28 were found to be hyper-reactors to cold, showing u rise of more 

T H *i n " ,c " diaMolic pressure on exposure »ocold; but the rema.. 17 were normoreacI 

with lest than 20 mm Hg H ncoUc ;! , ,i„.ive humans, only 

7 were hypet reacton toco then wen aornxxcacton. Apply if Independence to find 

whether or not there is a significant association between hypertension and hyper-reaction to cold. 


. errors of measurement while reliability is affected mostly 


CHAPTER 10 

' ^The«telow. in U “ f0ll0Wl,, * Wi ' h *“ wonb choM " 'l>o>e given wi.hin .he 

(a) Validity of a test is affected mostly by 

by _errors of measurement. 

(b) Reliability coefficient is that proportion of the_variance of test scores which is its 

(c) Reliability coefficient is a measure of _ 
correlation. 

(d) The consistency of results of a test on repeated application on the same group is given by its 
while the capacity of a test to measure the specific variable in exclusion of others is given by its 


. variance. 


- of a test while the index of reliability is a measure of 


variance in a test while validity depends basically on 


(c) Reliability depends on the proportion of _ 
die_variance. 

^-variables are not deliberately used in an experiment but still affect the dependent variable while 

-variables are deliberately used for studying their effects in the experiment. 

Coefficient of_is the reliability coefficient estimated by the_method of estimating reliability. 

1,11 The capacity of test scores to denote the true scores is given by the_validity coefficient which 

ls the square-root of the_coefficient of that test. 

(reliability, relevant, true, validity, equivalent, test-whole, independent, random, common-factor, 
reliability, systematic, self-correlation, alternate-forms, total, intrinsic, true) 


Scanned by CamScanner 












366 


STATISTICS LN BIOLOGY AND PS^CHOLOG^ 

of Column 2 and put the serial number of the Utter 


2. Match each item of Column 1 with the correct item 
in the space after the former. 

Column I 

(a) coefficient of stability_ 

(b) difficulty' value_ 


(c) relevant variable_ 

(d) coefficient of internal consistency 

(e) expectancy table_ 

(/) coefficient of equivalence_ 

(g) treatment variable_ 


Column 2 
(i) parallel-forms 
(g) extraneous variable 
(iu) independent variable 
(rv) test-retest 
(v) construct validity 
(vf) item analysis 
(vn) rational equivalence 
(mi) scattergram 


Marie the odd item in each of the following series with a tick (✓) mark. 

(a) : (i) index of discrimination, (if) coefficient of internal consistency. (Hi) index of reliability, (iv) 

coefficient of equivalence 

(b) : (i) organismic variables, (if) treatment variables, (id) situational relevant variables, (iv) stimulus 

variables. 

(c) . (0 K R 20, (D Spearman-Brown formula, (iff) gnte of difficulty. (iv) KuderRich.irdson formula 

■ 1 


(d) : (i) predictive validity. (#/) content validity. ( tl f) 

(e) : (0 coefficient of internal consistency, (it)_ 

(iv) coefficient of stability and cquivakoce 


validity, (iv) criterion-related validity. 
Oii) coefficient of test-rctcst reliability. 


Fill up the blanks in each of the follovififf to* « . 

parentheses below. . b ’ WOfds choscn froin toose within the 


(a) Reliability of a test is affected mosUy by _ 
b >’-errors of measurement. 

° f m ' eniaJ com,aenc >' °f * measures both the 


* TO of measurement while validity j, affected mostly 

— of its items and 


of its 


toe guess factor while the i 


(c) The increase in the number of alternative answers to test items 

in the number of test items_the reliability coefficient 

(d) Both-variables and_variables include s • 

- include organismic variables. 

te> The consistency of results of a test on its reneat^ i- - 

■hat test while the capacity of a test to measure 00 )^ 3 ^°”“ S “ lple “ Sven by the 

» ™ a test as a measure of the' “ “ » V 


increase 


of 


— validity assesses a test as a me^Tf *« 1 while dte 

(y) Reliability of a test depends on_difficulty levels and ■ “ a s P«ific variable. 

tH) Predictive validity Of a test is dependent upon_ °< *e test items. 

thomogeneity, low, systematic, decreases, identical fe ' ekand -- correlations ofttsitetav 


Scanned by CamScanner 
















SAMPLE QUESTIONS 


: latter 


(tv) 

ulus 

tula 

/. 

ity, 

the 

tly 

its 

se 

of 

ie 

s. 

5. 

u 


nare each following pair of items. 

ComP 31 * 

Organismic and stimulus variables. 

(b) RcleV ant variables and intervening variables. 

(c) Power and speed tests. 

(d) Ratio IQ and deviation IQ. 

(e) C scale and stanine. 

(/) Random errors of measurement and systematic errors. 

(g) T scale and C scale. 

(/i) Correlation table and expectancy table. 

(/) Linear transformations and nonlinear transformations. 

6. (a) Explain the difficulty value and the discriminatory value of test items. 

(b) Discuss the properties of difficulty value. 

(c) Describe the methods of estimation of discriminatory value. 

7. (a) Describe different classes of independent variables of psychological experiments. 

(b) Explain what are the relevant variables in psychological experiments and describe their classes with 
examples. 

(c) What are intervening variables ? 

8. (a) What is meant by the reliability of a loi ’ 

(b) Discuss the test-retest method and the split lull method of estimating the reliability of a test. 

(c) Use the Spearman-Brown formula to work out how long a test, consisting of 22 items and having a 
reliability coefficient of 0.32, be made to raise its reliability coefficient by 0.18 ? 

9. (a) Describe the coefficients for expressing the equivalent-forms reliability. 


(b) Discuss the alternate-forms method and the rational equivalence method for estimating test reliability 


10 . 


(c) Use the Kuder-Richardson formula 21 to work out the coefficient of internal consistency of a test o 
40 items where the mean and the SD of the total scores amount to 28 and 4.5 respectively. 

(fl) Discuss the split-half method and the rational equivalence method for estimating test reliability. 

(b) Mention the Spearman-Brown formula and the Kuder-Richardson formula 20, and their respective use: 


(0 Work out the coefficient of internal consistency of a test having 36 items, 9.45 as the sum of iter 
vanances, and 4.78 as the SD of the total test 


" (a) Give a brief account of different classes of variables involved in an experiment. 

[b) What are extraneous variables ? Describe different classes of extraneous variables with examples. 
What are intervening variables. 

iscuss the relations between the reliability and the validity of a test 
^ hat are criterion-related validities. Describe different types of criterion-related validities. 

41 Jre Psychological constructs ? Give an account of construct validity and its types. 


Scanned by CamScanner 



368 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 

reliability of a test is related to the length of Ihc 


13. (a) Use the Spearman-Brown formula to explain how th 


test. 


validity and reliability. 


(b) Compare the effects of lengthening of the test len e ^ ^ ^ ^ # validity coefficient of 0.52. 

(c) A performance test has 40 test items, a reliability c ° e ^ lC ' e n(Tth lS increased to 60 ? Also work out its 
What would happen to its reliability and validity if i - 

length required for a validity of 0.60. 

14. (a) Mention the basis of content validity and describe what 

(b) What are the different ways of working out content validity • 

. . rtffhe a itemate-forms method of estimating 

(c) Discuss the merits and demerits of the split-half method an 

reliability. 

r „ * es t for job performance, and that ot the 

15. (a) Describe briefly the working out of the content validity tor a test j 

construct validity for an intelligence test. 

. , • i . .l p v'lriinces of the test scores. What is the 

(b) Describe the reliability of a psychological test m terms of tht a 

SE of measurement ? 

(c) The reliability coefficient and the SD of the total test scores were found to be 0.65 and 8.5 respectively. 
Compute the true variance and the error variance of the test scores, the SI. o measuremen , an t e 
proportion of the error variance in the total variance. 

16. (a) What is a norm ? State briefly its significance in psychological tests. 

( b ) Write briefly about nonlinear and linear transformations of test scores with examples. 

(c) Describe the working out, advantages and limitations of the T scores. 

17. (a) Compare the C scale and the staninc scale, mentioning their advantages and limitations. 

(b) Describe the significance of factor analysis in psychological test construction. 

(c) Discuss mentioning the computational formulae how the factor theory theorems can be used in working 
out validities of tests. 


CHAPTER 11 

1. Mark the odd item in each of the following series. 

(a) : (/) Scheffe’s F test, (n) Gabriel’s SS-STP, (iii) multiple comparison t test, (rv) Bonferroni-modified 

t test. 

(b) : (/) error variance, (rr) between-columns variance, (ifr) interaction variance, (/v) between-rows variance. 

<C): ^S I OA amf0nnati ° n ' * '° garithmiC ‘ ranSfonnation ’ W H-WMt transformation, (,V> tear 

% 

{d): (/) K™skal-WalUs ft 00 chi square, (iii) Scheffe’s F, (rv) Mann-Whitney U. 

2 • o C Z“ S “ S b> ' Ch00Si ” 8 30(1 by a Uck M the correct alternate 

(a) A significant F ratio in a one-way model I anova for more than two groups has to be followed bv : 
(0 multiple companson test, (if) omega square computation, (iii) both. (,V> neither of these 


Scanned by CamScanner 



SAMPLE QUESTIONS 


369 


(t) A significant F ratio in a one-way Model II anova with two groups has to be followed by : 

(0 multiple comparison t test, (ii) omega square computation, (in) neither of the two, (iv) both (/) 
and (ii). 

(c) Where two treatment variables have been used in an experiment, the anova to be used should be : 

(i) two-way model I anova, (ii) one-way model I anova, (iii) one-way model II anova, (iv) two-way 
model II anova. 

(d) A significant F ratio in a one-way model I anova for only two groups should be followed by : 

(i) Scheffe s F test, (ii) omega square computation, (iii) working out of added variance component, (iv) 
both (i) and (ii). 

(e) A significant Kruskal-Wallis H in a model I anova for more than two groups has to followed by : 
(0 Gabriel s SS-STP ; (ii) Scheffe’s F test, (iii) Bonferroni-modified t test, (iv) Mann-Whitney U test. 

(/) The order effect may be eliminated by : 

(i) random sampling, (ii) arc-sine transformation, (iii) random sequence of treatment levels, (iv) none 
of these. 


3. Fill up the blanks in each of the following statements by correct words chosen front those given within 
the parentheses below. 

(a) The F t ratio in a two-way anova is the ratio between the_variance and the_variance. 

_of variances. 


{b) V°Ti CedaStidty imp,ies that lhc ^° U P S uscd an experiment initially possess 
the differences between the latter being due to their_errors only/ ? ‘ 

(c) In arc-sine transformation, the transformed score is the 
raw score. — 


(d) The F ratio in a one-way anova has the 
denominator. 

(e) Where only_ 


— whose_is the square-root of the 

variance as its numerator and the_variance as its 


. { , . . variab,es havc bcen applied in the experiment, a — model anova has to be used, 

e s test is an-multiple comparison test while Gabriel’s SS-STP is an_multiple 


comparison test. 

(g) The F c ratio in a two-way model I anova has the 


in it has the 


variance as its numerator. 


variance as the numerator while the F ratio 


(, ? n0fi ’ * ilhin ‘ cel ' s ' fi “ d ’ 'Vilhin-groups. type n, interaction, between-rows, alter-design sine between 
columns; homogeneity, between-groups, angle, sampling, treatment) ** 

4 inThiT space afteMhe"former/ WUh "* Uem ° f COlU,m, 2 “ d the <* the lane, 


Column 1 

(°) Scheffe’s F test 
one-way anova 


( f ) interaction variance_ 

model I] anova_ 

^ Gabriel’s SS-STP _ 

^ ^onferroni modification 

® mod el I anova_ 

] ^kal-Wallis H 


Column 2 

(0 multiple-comparison t test 

(ii) Mann-Whitney test 

(iii) classification variable 

(iv) a-posteriori test 

(v) single independent variable 

(vi) two-way anova 
(viz) lesser mean square. 

(viii) before-design multiple comparison 
(Lx) treatment variable. 


Scanned by CamScanner 



















370 


STATISTICS 


IN BIOLOGY AND PSYCHOLOGY 


5. (a) Discuss the assumptions underlying the one-way ano\a. 

(b) Why is anova preferable to t test ? 

v ' 3 v - Scant difference between the hourly 

(c) Work out one-way anova to find whether or not there 1S a slg ^ mn i e Q f parakeets, respectively 

oxygen consumptions (ml per 100 g bodyweight) of the following sample P y 

before and after exposure to a pesticide. 

, n g 9 10- 

Individual : 1 2 _ 3 4 5 

Before pesticide: 170 185 160 155 175 16S 

155 nsn ' no 130 135 148 145 


After pesticide 


2 . 

185 

160 


3 

160 

130 


165 

148 


150 

128 


If there be a significant difference, work out the strength of association between oxs^n consumption 
and the levels of pesticide. 

6. (a) Explain with examples what you mean by one-way anova, model I anosa and model 

(b) Describe how you would work out the variance ratio from the dependent variable scores in a one way 
anova and interpret it. 

(c) Apply one-way anova to the following pulmonary ventilation data tL/nvin) ol respectivcl) men and 
women to find if the mean ventilation differs significantly between the sexes. In case of a significant 
difference, work out the added variance component between the groups. 

Men : 6.55, 7.50, 7.26. 9.00, 8.50. 6.25, 7.30, 8.20, 7.45. 8.25, 8.00. 

Women : 6.10, 6.00, 5.80, 6.35. 6.00. 5.50, 5.75, 6.15, 5.30. 

7. (a) Explain what arc a-priori and a-postcriori multiple comparison tests. 

( b ) Describe how you would work out the Bonfcnom-modified multiple comparison t test and the Scheffc’s 

F test and interpret the computed statistics. » 

(c) Apply one-way anova to find whether or not there arc significant differences between the performance 
test scores of the following three groups of students after they have practised according to three respective 
practice schedules. 


Group I 
Group II 
Group III 


10 , 

19, 

29. 


17, 

22 , 

28, 


19, 

24, 

33. 


16, 

23, 

30. 


18. 

27, 

29, 


12 , 

18, 

32, 


15, 

25, 

34, 


13, 

20 , 

35, 


19, 

29, 

27. 


11 . 

31. 


If there be significant differences. (/) apply Scheffe’s F test to find if the means of Groups II and III 

differ significantly, and (ii) work out the strength of association of the test scores with the practice 
schedule. v 

8. (a) When and how you would work out omega square and added variance component ? 

(b) Describe the computation and interrelation of an after-design multiple comparison test. 

(c) Following are the amounts of ethereal sulfates (mg) in 24 hnnrc’ nrin« 

on low-, medium- and high-protein diets respectively Use“v humans, kept 

contents cause significant differences in the urinary ethereal sulfate excretioL fmd ‘ f Pr01e ‘" 


Group I 
Group II 
Group HI 


90, 

120 , 

141, 


102. 100, 106, 110, 


94, 98, 85, 100, 

•Ii, 123, 125, 129. 132, 136. 131, 127, 

| 143, 138, 144, 139, 146, 150, 152, 140 

In case of significant differences. (/) apply multiple comparison r lest with n , ' 

find if the means of Groups 1 and n differ significantly and (ii) wort, ™ BonfeiTon ‘ modification to 
between urinary ethereal sulfates and dietary protein contents ** OUl l ” e slrcn8th °f association 


83. 

130. 

139. 


Scanned by CamScanner 






SAMPLE QUESTIONS 


371 


. • mMnt ky a two-way anova without replication. 

»- Zl” — “ - - r-— - sum of ■— ”" d work 

fihe variance ratios in a two-way anova without replication. 

) A ppiy two-way anova without replication to the following three^^ 1 ^^-ox^de! to find (0^ sulfur 
(mVmin) of a sample of beetles, exposed to three successive c\e s significant added 

1 oxide significantly changes the tracheal ventilatton scones, and («) ‘^ere g 

variance component due to random factors between the in m ua 


Individual 
Group I 
Group II 
Group m 


1 2 3 4 5 6 

81.5 85.0 80.7 78.5 80.0 84.5 

70.5 77.2 70.3 67.8 69.2 71.5 

58.6 65.0 59.1 55.7 57.4 59.6 


7 8 
83.5 79.8 

71.0 67.3 

61.2 55.4 


9 10 

80.4 85.0. 

70.1 77.5. 

58.5 64.0. 


10. (a) When do you use two-way anova with replications ? 

(h) Describe how the sum of squares is partitioned and different F ratios are worked out in a two way mod 
I anova with replications. 

(c) Apply a two-way model I anova to the following fasting blood sugar values (mg/dL) of four groups 
of subjects, each consisting of 5 individuals and administered one of the four combinations of the levels 
of cortisol and thyroxine, to find the significance of effects of the treatment variables and of their 
interaction. Also work out the omega squares for the significant effects. 

Fasting blood sugar values 


Levels of 
thyroxine 


Levels of cortisol 


level 1 


level 2 


level 1 


80 

83 

85 

65 

90 


140 

152 

135 

136 
145 


level 2 


120 

118 

108 

112 

125 


180 

168 

176 

175 

184 


(a) Write about the assumptions and uses of the Kruskal-Wallis H. 

(b) How do you work out and interpret Kruskal-Wallis anova in cav nf cn 

of subjects 9 0 3 m case of 80 experiment with three groups 

Gr0Uf 1 •• 80. 00, 85. 80, 84, 102, 95, 82, 

120, 102, 123, 112, 130, 125, 130, 127: 

130, 142, 150, 145, 148, 139, 150, 140. 


Group n 

Group m 


100 . 


Scanned by CamScanner 











372 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Answers 

Chapter 1 : 

1. (a) - (if/); ( b ) - («); (c) - (//); (</) - (iff); (e) - (//); (/) - (0; 0>) “ (*)• 

2. (a) - (lv); (fe) - (vii); (c) - (v); (ff) - (i); (e) - (ii); (/) - (v/ff); (if) - (»0; (*) “ ( v 0- 

3. (a) relative, treatment; ( b ) independent, dependent; (c) absolute, classification; ( d ) homogeneous, stratified; 
(-) parameter, statistic; (/) probability, choice. 

4. (a) - (if); (6) - (iv); (c) - (i); (</) - (iff). 

Chapter 2 : ... 

1. (a)-(iff); (b)-(ii); (c) - (iii); (</) - (ii); (e) - (i); 0) - (iff); (g) - (»)• 

2. (a) - (iv); (h) - (i); ( c ) - (vi); (ff) - (v); (e) - (ii). 

3. (a) measurement, scattergram; (b) continuous, midpoint; (c) c/, X u ; (d) many, jaggedness. 

4 - (a)-(0; (b) - (iv); (c) - (ii); (d) - (iii). 

Chapter 3 ; 

1. (a) - (ii); (ff) _ (i); ( c ) _ (iii); (</) _ (iii); ( e ) - (if). 

2. (a) median, mean; (ff) mode, mean; (c) half, one-tenth; (ff) all, 0.04. 

Chapter 4 : 

1. (a) - (iff); (ff)-(i); (c) - (ii); (d) - (Hi); (e) - (ii). 

2. (a) df, sample-size; (ff) relative, absolute; (c) second, first; (ff) clumped, relative; (*) squared, no. 

3. (a) - (iii); (b) - (ii); (c) - (i); (d) - (iv). 


Chapter 5 : 

1. (a) - 00; (b) - (iii); ( c ) - (i); (d) - (iii); ( e ) _ (,) ; (j) _ (,-,-) 


2 . 

3. 


(a)-(ii); (b) - (i); (c) - (iv). 


(a) statistic, parameter, (ff) theoretically, experimentally; (c) 
(e) nonlinear, linear. 


parameter, sampling; 


(d) linearly, standard; 


Chapter 6 : 

L (fl) - (/ ! 7); 0) - (/); (c) -« W-(ff); 00 - (ii); (0 - (iii); fe) - (,)• (/l) _ (m 

2- (a) - (0; (b) - (iv); (c) -*); (d) - (iv); («) - ( /v) . ^ W ’ W ^ 

3- 00 - (v); (b)-(iii); (c) - (vii); (d) - (i); (e) - (iv); (/) - (ii). 

4. (a) no, positive; (ff) mean, zero; (c) negatively, higher; (z/) Gossett Gauss- (*\ \ 

vanance; (g) positively, not; (h) half, equals. ’ ’ ^ less * equals; (f) Poisson 

Chapter 7 : 

L {a) ~ (/); (b) ~ ( " ); 00-(//); (e)-(iu); (/) _ (,•,•) 

2- (a) - (iv); (ff) - (viii); ( c ) - (vii); (J) - (iff); (<?) _ (ic); (/) ' 

3. (a) rise, rise; (b) retained, acceptance; (c) one-tail two tail- (<h ' ^ ^ ~ ( °* 

rejection; (/) rise, fall; (g) half, whole. ’ ’ (</) acceptance , rejection; (e) accepted 


Scanned by CamScanner 


SAMPLE QUESTIONS 


373 


Chapter 8 : 

1. (a) - (i*)\ (ft) - (v); (c) - (i); (d) - (x); (e) - (iii); (/) - (viii); (g) - (vii); (/*) - (vi); (i) - (ii). 

2 . (a) - (*v); (ft) - 00; (c) - 00; 00 - (*v); 00 - Ov); (0 - (0- 

3. 00 *• variance; (ft) partial, regression; (c) SE, estimate; (d) dichotomous, ordinal; (e) negatively, not; 

(/) rho, tau. 

4. (a) - 00; 00 ~ (O’* O) - 0'v); (d) - (iii). 

Chapter 9 : 

1. (a) - 0»0; (ft) - (0; (c) - (iv); (rf) - (ii). 

2. (a) - 0’v); (ft) - 0*0; (<0 -(v); (d) - (0; («) - (vi). 

3. (a) chi-square, G; (ft) independent, unequal; (c) median, median; (d) independence, contingency; 
(e) composite, signed; (/) increased, decreased; (g) exceeds, more. 

4. (a) - 0'v); (ft) - (0; (c) - 00; 00 - 0*0; (<0 - (***')• 

Chapter 10 : 

1. (a) systematic, random; (ft) total, true; (c) self-correlation, test-whole; (d) reliability, validity; (e) true, 
common-factor; (f) relevant, independent; (g) equivalent, alternate-forms; (ft) intrinsic, reliability. 

2. (a) - (iv); (ft) - (vi); (c) - (ii); (d) - (vii); (e) - ( viii ); (f) - (i); ( g ) - (iii). 

3. (o) - (0; (ft) - («0; (c) - (iii); (d) - (ii); (e) - (i). 

4. (o) random, systematic; (ft) homogeneity, equivalence; (c) decreases, increases; (d) independent, subject¬ 
relevant; (e) reliability, validity; (/) concurrent, predictive; (g) identical, high; (ft) differing, low. 

Chapter 11 ; 

1. (a) - (ii); (ft)-(0; (c) - (iv); (d) - (iii). 

2. (a) - (iii); (ft) - (iii); (c) - (i); (d) - (ii); (e) - (iv); (J) - (iii). 

3. (a) interaction, within-cells; (ft) homogeneity, sampling; (c) angle, sine; (d) between-groups, within-groups; 

(e) treatment, fixed; (f) a-priori, after-design; (g) between-columns, between rows. 

4. (a) - (viii); (ft) - (v); (c) - (vi); (d) - (iii); (e) - (iv); (f) - (i); (g) - (ix); (ft) - (ii). 


Scanned by CamScanner 


APPENDIX 

Table A. Ordinates at specific xlG or z scores and areas from the mean 
to the z scores of the unit normal curve. 


X 

G 

Area 

Ordinate 

X 

G 

.00 

.0000 

.3989 

35 

.01 

.0040 

.3989 

.56 

.02 

.0080 

.3989 

.57 

.03 

.0120 

.3988 

.58 

.04 

.0160 

.3986 

.59 

.05 

.0199 

.3984 

.60 

.06 

.0239 

.3982 

.61 

.07 

.0279 

.3980 

.62 

.08 

.0319 

.3977 

.63 

.09 9 

.0359 

.3973 

.64 

.10 

.0398 

.3970 ' 

.65 

.11 

.0438 

.3965 

.66 

.12 

.0478 

.3961 

.67 

.13 

.0517 

.3956 

.68 

.14 

.0557 

.3951 

.69 

.15 

.0596 

.3945 

.70 

.16 

.0636 

.3939 

.71 

.17 

.0675 

.3932 

.72 

.18 

.0714 

.3925 

.73 

.19 

.0753 

.3918 

.74 

.20 

.0793 

.3910 

.75 

.21 

.0832 

.3902 

.76 

.22 

.0871 

.3894 

.77 

.23 

.0910 

.3885 

.78 

.24 

.0948 

.3876 

.79 

.25 

.0987 

.3867 

.80 

.26 

.1026 

.3857 

.81 

.27 

.1064 

.3847 

.82 

.28 

.1103 

.3836 

.83 

.29 

.1141 

.3825 

.84 

.30 

.1179 

.3814 

.85 

.31 

.1217 

.3802 

.86 

.32 

.1255 

.3790 

.87 

.33 

.1293 

.3778 

.88 

.34 

.1331 

.3765 

.89 

.35 

.1368 

.3752 

.90 

.36 

‘ .1406 

.3739 

.91 

.37 

.1443 

.3725 

.92 

.38 

.1480 

.3712 

.93 

.39 

.1517 

.3697 

.94 

.40 

.1554 

.3683 

.95 

.41 

.1591 

.3668 

.96 

.42 

.1628 

.3653 

.97 

.43 

.1664 

. .3637 

.98 

.44 

.1700 

.3621 

.99 

.45 

.1736 

.3605 

1.00 

.46 

.1772 

.3589 

1.01 

.47 

.1808 

.3572 

1.02 

.48 

.1844 

.3555 

1.03 

.49 

.1879 

.3538 

1.04 

.50 

.1915 

.3521 

1.05 

.51 

.1950 

.3503 

1.06 

.52 

.1985 

.3485 

1.07 

.53 

.2019 

.3467 

1.08 

.54 

.2054 

.3448 

1.09 


.2088 

.2173 

.2157 

.2190 

.2224 

.2257 

.2291 

.2324 

.2357 

.2389 

.2422 

MM 

MM 

.2517 

.2549 

.2580 

.2611 

.2642 

MB 

.2703 

.2734 

.2764 

.2794 

ms 

.2852 

.2881 

.2910 

.2939 

.2967 

.2995 

.3023 

.3051 

.3078 

.3106 

.3133 

3159 

.3186 

.3212 

.3238 

.3264 

.3289 

.3315 

.3340 

3365 

3389 

.3413 

.3438 

.3461 

.3485 

3508 

.3531 

.3554 

3577 

.3599 

.3621 


.3429 

3410 

3391 

3372 

3352 

3332 

3312 

3292 

3271 

3251 

3230 

.3209 

3187 

3166 

3144 

3123 

3101 

3070 

3056 

.3034 

.3011 

2989 

.2966 

.2943 

.2920 

.2897 

.2874 

.2850 

.2827 

.2803 

.2780 

.2756 

.2732 

.2709 

.2685 

.2661 

.2637 

.2613 

.2589 

.2565 

.2541 

.2516 

.2492 

.2468 

.2444 

.2420 

.2396 

.2371 

.2347 

.2323 

.2299 

.2275 

.2251 

.2227 

.2203 


X - 

(T 

Area Ordinate 

.X 

<7 

1.10 

.3643 

.2179 

1.65 

1.11 

3665 

.2155 

1.66 

1.12 

3686 

.2131 

1.67 

1.13 

3708 

.2107 

1.68 

1.14 

3729 

.2083 

1.69 

1.15 

3749 

.2059 

1.70 

1.16 

3770 

.2036 

1.71 

1.17 

3790 

.2012 

1.72 

1.18 

.3810 

.1989 

1.73 

1.19 

.3830 

.1965 

1.74 

1.20 

3849 

.1942 

1.75 

1 X 1 

3869 

.1919 

1.76 

1.22 

.3888 

.1895 

1.77 

1.23 

3907 

.1872 

1.78 

1.24 

3925 

.1849 

1.79 

1 X 5 

3944 

.1826 

1.80 

1 26 

3962 

.1804 

1.81 

1 X 7 

3980 

.1781 

1.82 

128 

3997 

.1758 

1.83 

1 X 9 

4015 

.1736 

1.84 

1 X 0 

.4032 

.1714 

1.85 

1 X 1 

4019 

.1691 

1.86 

1 X 2 

.4066 

.1669 

1.87 

1 X 3 

.4082 

.1647 

1.88 

1 X 4 

.4099 

.1626 

1.89 

1 X 5 

.4115 

.1604 

1.90 

1 X 6 

.4131 

.1582 

1.91 

1 X 7 

.4147 

.1561 

1.92 

1.38 

.4162 

.1539 

1.93 

1 X 9 

.4177 

.1518 

1.94 

1.40 

.4192 

.1497 

1.95 

1.41 

.4207 

.1476 

1.96 

1.42 

.4222 

.1456 

1.97 

1.43 

.4236 

.1435 

1.98 

1.44 

.4251 

.1415 

1.99 

1.45 

.4265 

.1394 

2.00 


1.47 

1.48 

1.49 

1.50 

131 

132 

133 
1.54 

135 

136 
1.57 
138 

1.59 

1.60 
1.61 
1.62 

1.63 

1.64 


.4279 

.4292 

.4306 

.4319 

.4332 

.4345 

.4357 

.4370 

.4382 

.4394 

.4406 

.4418 

.4429 

.4441 

.4452 

.4463 

.4474 

.4484 

.4495 


.1374 

.1354 

.1334 

.1315 

.1295 

.1276 

.1257 

.1238 

.1219 

.1200 

.1182 

.1163 

.1145 

.1127 

.*1109 

.1092 

.1074 

.1057 

.1040 


2.01 

2.02 

2.03 

2.04 

2.05 

2.06 

2.07 

2.08 

2.09 

2.10 

2.11 

2.12 

2.13 

2.14 

2.15 

2.16 

2.17 

2.18 
2.19 


.4505 

.4515 

.4525 

.4535 

.4545 

.4554 

.4564 

.4573 

.4582 

.4591 

ISM 

MM 

.4616 

.4633 

.4641 

.4649 

.4656 

1664 

.4671 

.4671 
.4686 
.469 } 
.4699 
.4706 

.4713 

.4719 

.4726 

.4732 

.4738 

.4744 

.4750 

.4756 

.4761 

.4767 

.4772 

.4778 

.4783 

.4788 

.4793 

.4798 

.4803 

.4808 

.4812 

.4817 

.4821 

.4826 

.4830 

.4834 

.4838 

.4842 
• .4846 
.4850 
.4854 
.4857 


.1023 

.1006 

. 09 S 9 

.0973 

.0957 

.0940 

.0925 

0909 

0893 

.0878 

.0863 

.0848 

0833 

.0818 

.0804 

.0790 

.0775 

.0761 

.0748 

.0734 

.0721 

.0707 

.0681 

..0669 

.0656 

.0644 

. 0$32 

.0620 

.0608 

.0596 

.0584 

.0573 

.0562 

.0551 

.0540 

.0529 

.0519 

.0508 

.0498 

.0488 

.0478 

.0468 

.0459 

.0449 

.0440 

.0431 

.0422 

.0413 

.0404 

.0395 

0387 

.0379 

.0371 

.0363 


Scanned by CamScanner 






APPENDIX 


375 


Table A. Ordinates and areas of (lie unit normal curve (continued). 


,v 

Area 

2.20 

.4$6 l 

2.21 

,4864 

2.22 

.486$ 

2.23 

.4871 

2.24 

.4 S 75 

2.25 

.4 S 7 S 

2.26 

.48 SI 

2.27 

.4884 

2.2$ 

.4887 

2.29 

.4890 

2.30 

.4 S 93 

2.31 

.4 S 96 

232 

.4 S 9 S 

233 

.4901 

2.34 

.4904 

2.35 

.4906 

236 

.4909 

237 

.4911 

238 

.4913 

239 

.4916 

2.40 

.4918 

2.41 

.4920 


.4922 

2.43 

.4925 

2.44 

.4927 

2.45 

.4929 

2.46 

.4931 

2.47 

.4932 

2.48 

.4934 

2.49 

.4936 


.0355 

,0347 

.0339 

.0332 

.0317 

.0310 

.0303 

.0297 

.0290 

.02 S 3 

.0277 

.0270 

.0264 

.025$ 

.0252 

.0246 

.0241 

.0235 

.0229 

.0224 

.0219 

.0213 

.0208 

.0203 

.0198 

.0194 

.0189 

.0184 

.0180 


— Area Ordinate 


2.52 

:.S3 

2.54 

2.55 

2.56 

2.57 
2.5 S 

2.59 

2.60 
2.61 
2.62 

2.63 

2.64 

2.65 

2.66 

2.67 

2.68 

2.69 

2.70 

2.71 

2.72 

2.73 

2.74 

2.75 

2.76 

2.77 

2.78 

2.79 


.4938 

.4940 

.4941 

.4943 

.4945 

.4946 

.4948 

.4949 

.4951 

.4953 

.4955 

.4956 

.4957 

.4959 

.4960 

.4961 

.4962 

.4963 

.4964 

.4965 

.4966 

.4967 

.4968 

.4969 

.4970 

.4971 

.4972 

.4973 

.4974 


.0175 

.0171 

.0167 

*0163 

.0158 

.0154 

.0151 

.0147 

.0143 

.0139 

.0136 

.0132 

.0129 

.0126 

.0122 

.0119 

.0116 

.0113 

.0110 

.0107 

.0104 

.0101 

.0099 

.0096 

.0093 

.0091 

.0088 

.0086 

.0084 

.0081 


X 

a 


Area Ordinate — Area Ordinate 


2.80 

2.81 

2.82 

2.83 

2.84 

2.85 

2.86 

2.87 

2.88 

2.89 

2.90 

2.91 

2.92 

2.93 

2.94 

2.95 

2.96 

2.97 

2.98 

2.99 

3.00 

3.01 

3.02 

3.03 

3.04 

3.05 

3.06 

3.07 

3.08 

3.09 


.4974 

.4975 

.4976 

.4977 

.4977 

.4978 

.4979 

.4979 

.4980 

.4981 

.4981 

.4982 

.4982 

.4983 

.4984 

.4984 

.4985 

.4985 

.4986 

.4986 

.4987 

.4987 

.4987 

.4988 

.4988 

.4989 

.4989 

.4989 

.4990 

.4990 


.0079 

.0077 

.0075 

.0073 

.0071 

.0069 

.0067 

.0065 

.0063 

.0061 

.0060 

.0058 

.0056 

.0055 

.0053 

.0051 

.0050 

.0048 

.0047 

.0046 

.0044 

.0043 

.0042 

.0040 

.0039 

.0038 
.0037 
.0036 
.0035 
.0034 


3.10 

3.11 

3.12 

3.13 

3.14 

3.15 

3.16 

3.17 

3.18 

3.19 

3.20 

3.21 

3.22 

3.23 

3.24 

3.25 

3.26 

3.27 

3.28 

3.29 

3.30 
3.40 
3.50 
3.60 
3.70 
3.80 
3.90 
4.00 


.4990 

.4991 

.4991 

.4991 

.4992 

.4992 

.4992 

.4992 

.4993 

.4993 

.4993 

.4993 

.4994 

.4994 

.4994 

.4994 

.4994 

.4995 

.4995 

.4995 

.4995 

.4997 

.4998 

.4998 

.4999 

.49993 

.49995 

.49997 


.0033 

.0032 

.0031 

.0030 

.0029 

.0028 

.0027 

.0026 

.0025 

.0025 

.0024 

.0023 

.0022 

.0022 

.0021 

.0020 

.0020 

.0019 

.0018 

.0018 

.0017 

.0012 

.0009 

.0006 

.0004 

.0003 

.0002 

.0001 


Table reproduced by permission of McGraw-Hill Book Company New York from r a c 
Statist,cal Analysis in Psychology and Education , 3rd ed., 197 b and originally from J 
Educational Statistics, both published by McGraw-Hill Book Company ° y ' ' ’ 




Scanned by CamScanner 















376 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Table B. Critical values of t. 
Levels of significance for two-tail test 



Table taken from Table III of R.A. Fisher and F.Yates Itnthti^i /• o- , 

Medical Research , published by Longman Group Ltd London fnr ■ ^ T Bl °° sical> Agricultural ancl 
Edinburgh), by permission of ihc auihors and publthe'r^ 'h™* PUbUsh ‘ d b * 0liver & 



Tabled 
critical t 
(two-tail) 

a/2 (two-tail) 


Scanned by CamScanner 













APPENDIX 


377 


Tabic C. Critical values of chi square. Figures at the top 
of the table indicate levels of significance (a). 


Thble l). Critical values of Spearman’s rho. 
Levels of significance for two-tail test 


df 

.50 

.30 

.20 

.10 

.05 

.02 

.01 

t 

0.46 

1.07 

1.64 

2.71 

3.84 

5.41 

6.64 

2 

1.39 

2.41 

3.22 

4.60 

5.99 

7.82 

9.21 

3 

2.37 

3.66 

4.64 

6.25 

7.82 

9.84 

11.34 

4 

3.36 

4.88 

5.99 

7.78 

9.49 

11.67 

13.28 

5 

4.35 

6.06 

7.29 

9.24 

11.07 

13.39 

15.09 

6 

5.35 

7.23 

8.56 

10.64 

12.59 

15.03 

16.81 

7 

6.35 

8.38 

9.80 

12.02 

14.07 

16.62 

18.48 

S 

7.34 

9.52 

11.03 

13.36 

15.51 

18.17 

20.09 

9 

8.34 

10.66 

12.24 

14.68 

16.92 

19.6S 

21.67 

10 

9.34 

11.78 

13.44 

15.99 

18.31 

21.16 

23.21 

11 

10.34 

12.90 

14.63 

17.28 

19.68 

22.62 

24.72 

12 

11.34 

14.01 

15.81 

18.55 

21.03 

24.05 

26.22 

13 

12.34 

15.12 

16.98 

19.81 

22.36 

25.47 

27.69 

14 

13.34 

16.22 

18.15 

21.06 

23.68 

26.87 

29.14 

15 

14.34 

17.32 

19.31 

22.81 

25.00 

28.26 

30.5S 

16 

15.34 

18.42 

20.46 

23.54 

26.30 

29.63 

32.00 

17 

16.34 

19.51 

21.62 

24.77 

27.59 

31.00 

3 3 41 

18 

17.34 

20.60 

22.76 

25.99 

28.87 

32.35 

34.80 

19 

18.34 

21.69 

23.90 

27.20 

30.14 

33.69 

36.19 

20 

19.34 

22.78 

25.04 

28.41 

31.41 

35.02 

37.57 


.001 n .10 .02 .01 


10.83 

5 

.900 

1.000 


13.82 

6 

.829 

.943 

1.000 

16.27 

7 

.714 

.893 

.929 

18.46 

8 

.643 

.833 

.881 

20.52 

9 

.600 

.783 

.833 

22.46 

10 

.564 

.746 

.794 

24.32 

12 

.506 

.712 

.777 

26.12 

14 

.456 

.645 

.715 

27.88 

16 

.425 

.601 

.665 

29.59 

18 

.399 

.564 

.625 

31.26 

32.91 

34,53 

36.12 

37.70 

39.29 

20 

22 

24 

26 

25 

30 

.377 

.359 

.343 

.329 

.317 

.306 

.534 

.508 

.485 

.465 

.448 

.432 

.591 

.562 

.537 

.515 

.496 

.478 

40.75 


.05 

.01 

.005 


43 82 Levels ot significance for one-tail test 

45.82 


21 

20.34 

23.86 

26.17 

29.62 

32.67 

36.34 

38.93 

46.80 

22 

21.34 

24.94 

27.30 

30.81 

33.92 

37.66 

40.29 

48.27 

23 

2234 

26.02 

28.43 

32.01 

35.17 

38.97 

41.64 

49.73 

24 

23.34 

27.10 

29.55 

33.20 

36.42 

40.27 

42.98 

51.18 

25 

24.34 

28.17 

30.68 

34.38 

37.65 

41.57 

44.31 

52.62 

26 

25.34 

29.25 

31.80 

35.56 

38.88 

42.86 

45.64 

54.05 

27 

26.34 

30.32 

32.91 

36.74 

40.11 

44.14 

46.96 

55.48 

28 

29 

27.34 

28.34 

31.39 

32.46 

34.03 

35.14 

37.92 

39.09 

41.34 

42.56 

45.42 

46.69 

48.28 

49.59 

56.89 

58.30 

30 

29.34 

33.53 

36.25 

40.26 

43.77 

47.96 

50.89 

59.70 


Table taken from Table IV of R.A. Fisher and F.Yates, 

tatistical Tables for Biological, Agricultural and Medical 
esearch, published by Longman Group Ltd., London 
(previously published by Oliver & Boyd, Edinburgh), by 
permission of the authors and publishers. 


Reprinted by permission of McGraw- Hill 
Book Company, New York, from J. P. 
Guilford and B. Fruchter, Fundamemtal 
Statistics in Psychology and Education, 6th 
ed., 1978, and originally from WJ. Dixon 
and F.J. Massey, Jr., Introduction to 
Statistical Analysis, 1951, both published 
by McGraw-Hill Book Company. 



Table E. Critical values of Mann-Whitney U. 
a = 0.02 (two-tail) and 0.01 (one-tail). 


n 

9 

10 

11 

12 

13 

14 

15 

3 

1 

1 

1 

2 

2 

2 

3 

4 

3 

3 

4 

5 

5 

6 

7 

5 

5 

6 

7 

• 8 

9 

10 

11 

6 

7 

8 

9 

11 

12 

13 

15 

7 

9 

11 

12 

14 

16 

17 

19 

8 

11 

13 

15 

17 

20 

22 

24 


Scanned by CamScanner 

















378 


STATISTICS IN 


biology and psychology 









Scanned by CamScanner 









Table G. Critical upper and lower T values (T u and T^) for Wilcoxon's rank sum test. 
N,M= sample sizes, a = 0.005 (one-tail) and 0.01 (two-tail). 



APPENDIX 


379 


rt 

co go co c 


IO ® N j 

II 

CM CM CM « 
od CM* <o* < 

5 

■n- tn in i 

co 

T— I 

11 

m 1 

CM CO "'t 
CM CM CM 

<6 o cd 


CM co CO 

CM 

•o- CO ■»- 

T— I 

11 

0)0-1- 
T- CM CM 

<o’ C7> CO* 

2 

O O T- 


oo^r^NNNNCDQtococouoinininuj^ 

CO^inCDNCOOOr-CViO^mcDNCOQO 


CM 

in 


«t00CVJCDOir)0)OST-L0O^C0WO-£22^ 

fO(ONNQOOOCOO)QOOr-i-i-CNJC\jr)r)CO^J-. 
;2;£^, r -,~T-T-<»-’r-CMCMCMCMCMCMCMCMCMCMCMCMCM 

CO 
in 


coojcMr-oo)Ooos<oinin^nw^j;OOcoNN 
mcoNoocDOO^CNjco^incoNcooOT-^-w^^^ 
cm cm cm cm cm cm co co o co co co co co o co ^ ^ ^ 
rC T-~ ri oo CsJ l£> O) CO nT 1 6 oo CNJ 

ooscDincowt-ooocoin^ojr-ocoNWjcvii:® 


co co o> CO CO* o> CO CD O CO CO o 
CNJ CVJ CM CO CO 


VU Ui V 7 w W W v W 52 N O CO 

oco^^inminococoNN 


N O ^ N ^ 
N CO GO 00 O) 


inco^O)Kto^oJO®2r?;S4ES?SS5o^ojco55in<o 

(ONOMDOlOrNOO^tniOSNOCjOO^^no 


COr-^tNO 
CO O) O) O) o 


SwaSSSciSaSSSnnnconrtc^co 

CM in oo ~ nT o co «o of cm g> gj cm £ g 5 : 

ooo^-’-^-CMOJOJwncnn^n^^ 


o 

II 


O) 

II 

5 

00 

II 

2 


N 

II 

2 

CO 

II 

2 


in 

II 


II 


CO 

II 


SSSlISSSSISISglgsSKRSSgg-;, 

KS'|SR'aS'S8S86'|888'S'2 5S588aaafe 

2552;?--§--§-§§S8^8Ss®^8S5 8 

ferf=gtf<^jf!f|:|f5Ml88S6*8i|8jg 


in to co 


Sf*8£*88*i$5HSSHS8|s§S$P 

I ^ k ^ _ m ia k. m _ *v< «* tfi a n ai 4 (flfflOMnmNO) *- 


r- o) t- 


5 3 8 R S 5 8 3 8 3 g f! R S S 8 8 3 3 S 8 5 


<\iooro<J)-«toinT-tocNjr^c2®'r<»‘{»Q , 5r!P2CfcSSSrt 

nsoooo(J)Oo»-j2^^2J:1:!:1!!I£2^^2222ww 

o cm* co in co od ai ^ cm in s © o w 


CO Nf (O N O) 
CO co CO CO CO 


^^f^T^T^^^ininininininioco 


S lOOCOOr- 

(O O (O N N 


SS!SSRRSS588822afiSfe;$S85SgR 

3 s a si a a s 5 s a 5 § g 5 § -• «• »• »• < ® » 5 -« « 


O^CDCMCOt-lOOiCON 

? in in o to CD N N 


oo cm n m O CO 
r- CM CM CO CO CO rT 


i-ino^cocMioo^ 
COCCOOOJOO’- ^ 

S £ £ 5 S £ 8 5 8 8 8 8‘ 8‘ 8 8 ®* g o- § rf § £ g g £ 


mco-incocMinocMSSSiSgSoo” 


1— CM CO CO 
CM CM CM CM 


N °' t ^^2§[nSS5©3NNNO®® 0) . 0) . 

3555555SS5®-«®«tss«5g8 

co Oi CM in r- o co ID ® 5 ^ ? Si S to § S S S r^. ” n 5 ® 

® ” cs. <H <H « ®- *?- J J 5 5 CO 0> O) o> O) O o o - - - CM CM 
miflininDD <0<DNNSCO 


CM CO 'O' 


_£®£®£8S88c$8 

^cMco^mcor-oom^-’--7 T 7^ + + + + + + + + + 

++++++ ++++++++ sS252525S5S 
2 S 2 ^ ^ ^ S 2 S S s 2 2 2 2j 2 n M „ „ ,, „ II II II II 

II II II II II » " " " “ " i z z Z z z z z z z Z Z z z z 

zzzzzzzzzzzzz^ 


Scanned by CamScanner 





380 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


3 

C 

•a 

c 

o 

o 

s-/ 

4-* 

1 

E 

a 

1/3 


(/) 2 

§ | 

O -H 

£§ 

U TD 

<2 g 

8 £ 

3 ‘1 

£? ' 

> <u 

L C 

K o 

i_ '—- 

S? *n 

J 8 

S 11 

V * 
CU 

a 


a 

o 


*C 

U 

d 

o 

z 

H 


in 

cm 


CM 


O <£> CM a> 
(D CO O r* 

h- h- o5 CO 

<p V CO r- 
<5 *- CM CO 
in in in m 



c\j oo in 

SS583 


o <o co o 

S O) r- CM 
O T- 


CM ▼“ C7> 00 h«» 
0)00r CM 
in CO CD CD CD 


V co o O) £ eg 

U) (O N I s * CO OJ 
5 CD CO CD CD CD CD 


cm oo m- a) in 

r* CVJ v lO N (D 

rs. rs. K 

cm' © 

h- 


h- CO 

F* K N § oo 
a> K in co 


O CD r- N CO CD 
^NCOOt-CO^(£>^- 
00C0CD(DCD(DO)(DCDCD 


o <D CM CO 55 
oo o> zz ™ 3 10 


S CMOCDNincOr-OCO0tC9»-CDNCD3 
NOOCOOOrCMCOnTflOCDNNCOOO 
^^TfTrrrinininininiDinininininincD 


) o o o o 

r\j *“1 *“1 ,r- - 
t-OCDN^ 
CO CM CM CO yt 
CD CD CD CD 


CO 

CM 


CM 

CM 


h- CM . 

in co 

CD CD CD 
CM 

TT 


828 
h- h- h- 


s 


00 
h- N 


O) *t <7> rT 
N O) O CM 
r- r- co co 


^ w I — I I " | I'* I \AJ UJ IA/ W/ V# 

cm o K in co t-' o> co rj eg o* co' <o co *-' o' £•* in co o 
ntTtuHosscomor-T-ojnvtoiDN® 
■^■''TTj-^j-Tj-Tr'^rrTtioinininininincoininin 


,-coo»-o^©fiS 

cococooo>o»o>o>cna> 

* * * ‘ ' * -* — oo <n 
co cn 
in m 


cn tt o 
co in e- 


S O)COCOC')SW<£>r-(DOinO>noOC)<OCMN>-<DOlO 

,«!2,tiS'i® owni/,t o' ,> o ) ON oin(D ®2l2! 

(O (O (O (O (O (O (O N S N N N N S C0 00 00 (O 00 C0 CO O) O) 

co co t- co in co' o' co' in' eg' o' in' eg' o' h»* ?’ cvj* o' h»" 3 " eg* o' 

S22° T ' NP5 w , f |n( o‘ONooo)00»-'-egOTf'j 

cooTfTrirrtn^Tf^v'ir’MT^^ininininininin 


— eg 


8fe588S3ftK8858RI883i;5S23§5 

lOlOlOiOlO^CDCDCDCOCDNNNNNNNOOCOOOCOCO 
ri fv -T <vC ia aT /»T ,a* -T -T -r • • .r • . * • • “ 


S !n S JC S? JS S! 2 £ 00 o) co co o' K ^ eg o co 
rr<iv<JS^I^S2?2?^ T T , T c N c ^^ rfincDfDh^ 


cocorocococococon^nn rr 


* 


t *: 


S c» o> 

n rf 


s 

in 


CM 



S§ 53 M$Si?S.!?S.!SE.S §3 


in r- co 
in co 2 
co co co 



! 111 ^II'HiMUtSiSIgg&g 

> 8 l 8 g&SIBI 82 SM 8 g 5 tf*tf** 

^ r? rt 


CD CM CD xr *- 

oococococooococooococooo 


cn 


CM 

in 

CM 


|o58KS582S8?5J388°S‘'><oo>-<o 

s- 5 5 5 :• 51 s 5 s 3 5 £ 2 s 5 - s - s - s - s - $ 

intnfOhv^m^^I^^^NCOcOTro^CMNnO) 


| cm 

co 
co 

CM 
CM 


CD CM 
CO 
CO 

— . & 
2 o> 


u; u; r- ps. (\j oo 

lOCDCOr^OOCOOoJo^ZLIA.iiCii^^^OeDCMr^CDcn 

P.r?im.$l§ 5 §§sgggS 8 S 

Sl 38 S: 3 ia 88 £its 8 & 8 S§ggg$ 3 'g 


^!2^^®®ONn”SSsmS 0)0r - 

qnnqn ntn ^|J®Nmaoojn 

r- co t- co t- co t- (o' m‘ <—T I* Z- - ^ “i «o in 

ssssaassmsglgigggg 


co 

CM 

CD 

CO 


CD 

CM 


mN^U)(O(OCDNNcD00CD 

8 «S 8 S 88 fe 88 
g 8 S 3588 £ 2 ® 

’“’"'’“’-cmcmcmcm 

’"WC0tju5(ONc0O)° 
+ + + + + + + + + + 


J* 11 •• H II II || || || || 

Z ^2Z22Z2Z2 


o 

o 

■m- 


o o 

CM CO 
"O’ tj- 


*■ to o 


I 5 ® ^ TO 

^ TT TJ. ~ 

CO w" 3 22 w" 

wSSiSiS 

^ ^ LO CO N. ^ 


eg co 
CM CM 


CM 00 tj- 
LO CO 

co in in 
o' in o' 

0)050 

cm eg co 


co co co 
O o r- 
^ »o m 

f^r eg' is." 
w © (O 
CM CM CM 


+ 

5 

ii 


£ °0 o> 


■•■ + + + + 

" " » ii ii 
2 222 2 


O T- 

CM CM 
+ + + 
255 

11 ii ii 
222 


uo 

uo 

in 

o 

co 


CM 

uo 

hi 

CM 

CM 

CM 


II H 
2 2 


H n 

2 2 


Scanned by CamScanner 


M + 23 276,534 310,586 347,639 385,695 425,753 467,813 511,875 556,940 604,1006 654.1074 705,1145 

M + 24 280,545 315,597 352,651 391,707 431,766 474,826 518,889 564,954 612,1021 662,1090 714,1161 

M + 25 285,555 320,608 357,663 397,719 438,778 480,840 525,903 571,969 620,1036 670,1106 722,1178 



APPENDIX 


381 


5 

o 

c 

•o 

c 

o 

u 


(A 

<L> 


CD 

in 



co 

CM 



m 

CO 

I s * 

00 

05 

O 

CM 

CM 

CM 

CM 

CM 

CM 

CO 

0 " 

in 

O’ 


05" 


05" 

CO 

co 

r- 

r- 


CO 

00 




**” 

T— 


T “ 

Tj- 

CO 


0 

05 

r^ 

CO 


CM 

CO 

^r 

tT 

in 

CO 

CM 

CM 

CM 

C H 

CM 

CM 

CM 

h*" 

^-T 

co’ 

0 " 


05" 

co’ 

CO 

rr 


in 

m 

in 

CO 

*" 




’ r " 


T “ 


CM 

0 

00 


in 

CO 

00 

05 

0 

0 


CM 

CO 

V— 

y— 

CM 

CM 

CM 

CM 

CM 

co" 

o" 


GO" 

T— 

in 

05' 

T- 

CM 

CM 

CM 

CO 

co 

CO 

y ~~ 


T— 

T “ 

T— 


T— 

h* 

’O’ 

CM 

05 

h* 


T~ 

in 

CO 

1^ 


GO 

05 

O 


OOOCONOCOin^COOJr-OOOCOStDlO I 

i-C\JCVJCO^incDNCOO)Or-CNJCOO^|55N ( 

nr\ rm nm nr'. rr\ rr\ m rr\ rf Trf 1 Trf T-f" 


oocococMr^cMr^cMr^cMcoT-iuT-vu 
OOOOr-r-CMCMncO^^in^CDtDNN^ 
T- T- CMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCM 

TtCOWOO)N0in(OWOO)N(DincOCVlOO) 

Ncoooor-cvjn^inooNcooor-wcvi 


GO 


lO O ^ CO CO N 


OJCDt-IDO^COCON 

'r-^cvjcvjcvjcoco^;^; 

CMCMCMCMCMCMCMCMCM 


05 in co 


G5 


r^inco^oi^mco^-c5!>3; 

QOt-OJCVJCO^IOIOCONOO 


in o" co n in 05 co I s * 


1- CO 

cm CVJ 


O CO N O 
O O O <r- 


O50^i-D0coT-cococoocomcooNinc\! 
O'i-0vJC0C0^inC0CDh-C0 05 05OT-C\JCVJC0^t 
*. ^ ^ ^ w 04 ^ CVJ 04 CM CM CM_ CO CO CO CO CO 00 

^ 06’ lt> ao" cm in cT co co" o co |C co' in 05" cm" co' 

T-T-c\jc\jc\jcococoTrTfU)inincD0coNNNcx)oo 


ooinc\j05cocoococoor^co 

COCOtininCONNCDOOOr- 

^r-r-r-r-r-i-r-T-r-CNJCNJCNJ 

aT CM" in OO" T-“ Tf N y~~ rf I s *" o" co" I s *" 
Nc 0 c 0000 ) 0 ) 0 ) 000 t-r-f- 


oo ^ j-; £* co a? cd cm qo g r- co 


ll lOr-ojcNjcoco^inincDNSoocoo 

E I CO (D 00 T-‘ Tt kT 05* cm" in go o" co" CO a> V-’ 


CO CD 00 
CD CD CO N N 


05 cm’ in oo o 

S N 00 00 07 0 ) 0 ) 


coTtr-comcMooincM 
^inOCDNCOCOOO 
CM CM CM CM CM CM CM CM CO 

cm co" o>" cm" in oo" cm" in" oo" 
nnco^^Trmioin 


N CO OO ^ o 

00 05 05 O r- 

2 *1 
m- in io cd oo w 

inincOCDCDCDNNS 


inT-^cMoo^rom 

^CNJCNJCncO^^S) 

cp in oo* o cm in jC 


S S 5 s _ 

ssc 


~ S! SJ S 


co tt in in S 

CM CM CM CM CM 


o co m co r- rt K 05 cm 
r "’“ r "’“CMCMC\JC\JCO 


8 

i 


CO 

II 

5 


0 . N . N . ® ® o O) o O t- T- w CVI 

£r 2 H $2 ^ co* oo" *1 ^ 
C V )COM , M , M > M-M , OCMM*COOOO‘ 
m in m in m co 


38.58.gftR8&5888 

S S 8 S 8 S 8 ft 8 5 ~ v 


i^T-in^cocoo^coT-incoco 

00 0 S r; ^ ^ ^ ^ o) o r-‘ cm" 

^ CM CM CM CM CM CM CM CM CO CO CO 


Jpc0CMin00T-M‘Nr-TtNocn 

™ « co co ^ in in in co co 

cm cm co Tf" in co" |C |C oo" o>" o" 

T " T ~ T " t " t " t " t “t" t “t“ , ”CVJCM 


05 

CO 

CO 


CO 

CO 

cm" 

CM 


CO o" co" LO CO* 


188882 fcsags 

>r s * T “. ' r ~L ' r ~L *~ *“ *•— *— 

"i on ro /■> _ a • --**_* * • • 


< ° 00 m 5 H®® , ‘ nin ®ocoinco 

..’iWCNJCNJ- OICOCOCO ^ ^ 

^ CD CD N N 00 OD O)' o' o' r-’ r-‘ CM CM* 


WOM'lOCDNfflO) 

+ + + + + + + + 


+ + + + + 


" 11 11 11 •• M II II II II II II || || 

2Z2 ZZZZZZZZZZZ 


rt 


05 

GO 

GO 

GO 


05 




in 


▼" 

▼“ 

co’ 

go’ 

0 

CO 

CD 


GO 

CM 


▼— 

CM 

CM 


▼— 

▼— 

O" 

cm" 

co" 

in 

in 

m 


CO 

CM 

05 

05 

0 

CO 

nT 


CO 

CO 

co" 



co 

CM 

CO 

05 


h* 




in" 

CM 

CM 

CM 

CO 

in 

co 

in 

in 

in 

co" 



T— 

T "~ 


in 

CO 

h* 

▼* 

t™ 

T— 

+ 

+ 

+ 




ti 

II 

II 

Z 

Z 

z 


SSZg” 

at d> 


- omoor-^S 
*1 ^CVICMCMOaCMCg 

* 8" 5 8 8 8 




k! J SP P 2 w 3 <o“ 
NNNNCOCOCOCO 


in 


05 CO I s * CM 

m* in in co 


* ^ T* ^ Tt ^ 


r^ 

Tj- 


CMinOOr-M-Nr-rt 

® co ooI a L o> o> o o 

JO N* OO* 05 " O" ^ 

CM CM CM CM 00 CO r-~ CM 

CO CO 

8888 SRRR 

10 in cd co* rC rC ao ao 


+ + 


O ■«— CVJ CO TJ- if) 

CM CM CM CM CM CM 

+ + + + + + 


■I II II II II II || 

zzzzzzzz 


Scanned by CamScanner 









Table G. Critical upper and lower T values for Wilcoxon’s rank sum test (continued). 

a = 0.025 (one-tail) and 0.05 (two-tail). 


STATISTICS IN 


biology and 


PSYCHOLOGY 


ld 

CM 


CsJ 


CO 

C\J 


CM 

CM 


CM 

II 


O 

CM 


CD 

T— 

II 

2 

CO 

II 


CO 


in 

II 


O) ** 
CO 

CD CD 
co <2 

ID lO 


co 2 

00 s 

CD CD 
CO OJ 

co o 

rf LD 


° 2 
CO 

CO CO 

T-“ o' 

m co 

rt * 
CD CO 

a) 
in in 

T-" O) 

5 5 


o co 

CO * 
in in 

co 

h- CO 

CO co 


O LD 
CO 

P 

ld" lO 
ID <g 
LD LD 


52® 

r- i'-_ 
T-' o 

T- CM 

ID lO 


TO <0 
ID h- 
CD CO 

CD P 
CO h- 
•M- 


CD O 
O CM 
CD CD 

CO* CD* 
CM CO 
rt 


CD TO 
ID CO 
LO ID 

CD* P 

CO to 
CO CO 


issssss®®- 

5 5 rf co' CM' CM CJ - 

r^SSSScDCDCD 

5SPS|gS.| 
o>" £ p g co ® s s 

SjSS888“ ,,n 


gstsssssgg 

SSo)g®fcNN}a‘®id 

^^0)O^CMCOTtlO©K 


§§883®° 


rv (\J CD ^ ^ ^ ^ 

w^S!nwcotn®®2^2i^in 

0)°0°o 

cm" P ® 8 co § ® S i- 00 co CO cm* t-‘ 

§ S S S CD CD CD CD CO CD TO O - CM 


CD r-_ N. P. P P co* CD* co 1C 
co'^cocmo^co^cdcd 

|§’f i£s§§ EE 


,n^COCOCVJCOT-lOO)COoo 

SSScoSo)0>0>0)0)0)0) 

- - rrT P CD* ID* CO CM T- O* 00* 
^^SSo^CMCOTflDCOCD 
t^SSScDCOCDCDCDCDCDCO 

./N/nMcoo , tcO’-ino)c\i 
£ ® {" 5 © ® CO CO « CO TO TO 

- .-T rn ^-T o C0~ CO CO y- CD* 00 
5^2S§Sn®DOOt- 
SSSSlOlDlDlDlDCDCOCO 


^ ^ _ rfNOCONOCOCOO) 

^AiinfDCMlDCO^^SjScOCOOOCMCO^lO 

§ I Is818 8S ggg5£5-$j.5«-ss55 



SS3S833MIM$§$ 8 
mmiH 5558 

51$ $ $ ® ? 5 S 8 5 S U) «n W ®. 

• k* -*T “ rn in r- 00 If) CM O) CD 00 O 

o®52cMcoco^r»oiocor'-£'®<£2 

8coco«cocococo®cococococo^ 


CM ID I s * O CO 2 2 

h- CO TO t- CM CO ^ 

co co co P I s ! P 

ID* CO* o* P CD CM 

CD CD P ® ^ 

^ ^ rf 't v V 


O CO CD CD 
CD f-- CO TO 

P K P P 
o* P ID* cm* 

O O t N 
ID ID LD ID 


LDh'TO’-COLDNTO 
IDCDt^-TOO^-CMCO 

© ® <0 ® N N 
|sT rf T-* CO* ID* rj To* CD CO* 


N N 


in r- 
cd o 
co 

T-" N 
N N 


in co 
in co 


co o 
r- co 

o* 


r- 

K 


CD 


O O O CD 
00 0)00 
CM^ CM_ CO CO 

in o~ in 
00 O) O) o 
CM 


^ CM CO 
+ + + 
55 5 5 

n i! i; i! 

Z Z 2 2 


w 

CO 


CD 


CD 

O 

CM 

rt 

in 

CD 


00 

CD 

T— 

CM 







in 

in 

N 

CO 

o 

CD* 

CO* 

CD* 

CD* 

CM* 

O) 

o 


T— 

CM 

CM 

CO 


CM 

CO 

CO 

CO 

CO 

CO 

co 

CO 

CD 

CD 

o 

1— 

CM 

CO 


in 

CD 

O 

CM 

CO 


in 

CD 


CO 









r-T 

N 

CO 

CD* 

in* 

i-T 

K 

CD 

N 

N 

00 

00 

CD 

O 

o 

CM 

CM 

CM 

CM 

CM 

CM 

CO 

CO 

€0 

00 

CD 

CD 

CD 

o 

o 

o 

in 

CD 

N 

00 

CD 

T— 

CM 

CO 

CO 

CO 

CO 

CO 

CO 




5 

o“ 

in 


N 

CM* 

00* 


co 



in 

in 

CD 

CD 


CM 

CM 

CM 

CM 

CM 

CM 

CM 

CM 

CD 

CD 

CD 

CD 

oo 

00 

00 

oo 

▼“ 

CM 

CO 


in 

CD 

N 

oo 

CO 

CO 

CO 

CO 

CO 

CO 

CO 

co 

cD“ 

i— 

cd“ 

r-" 

N 

CM* 

N* 

CM* 

o 

y~~ 

T- 

CM 

CM 

CO 

CO 

M 1 

CM 

CM 

CM 

CM 

CM 

CM 

CM 

CM 


in 

CD 

N 

00 

CD 

O 

▼— 


LDCDCDTOt-CMC01DCD®TO^CM 

^LDLDr-TOO-CMCOOTLDNCO 


in CD CD h* 


r- 00 in 

00 CO O) 


00 

o 


^ r- h* Mr 

-r- CM CM CO 
Tf Tt rf O’ 


(D N 00 O) O) 

00 O O ^ CM 

Tt rt in to in 

co‘ o> in t-‘ co 


t- CM CO 
in (O N 


Tt in CD CD N 

00 O) o CM 


O CD CM 00 

in in cd cd 




O N co 

oo oo cn 


N N 
O) O 
CO 

00 00“ 
in 

CM CM 


— CM 

oo“ co“ 
in cd 

CM CM 


CD CD 

?? 

of Tf 
CD h- 
CM CM 


CD CD 
ID CD 

O) Tt 
h- 00 
CM CM 


+ 

ii iiiii 

Z 2 2 


+ + + + + 


II 

2 


II II 


+ 
2 
ii 

Z 2 


CMCO'M-lDCDI^CDTO 

+ + + + + + 


+ + 
2 2 
ii ii 
Z 2 


II II II 11 II II 


ID LD 
h» 00 

O* ID* 
TO TO 
CM CM 

O t- 

CM CM 

+ + 
2 2 
ii ii 
z z 


ID 'fr 
TO O 
rt ID 

O* LD* 
O O 
CO CO 

CM CO 
CM CM 

+ + 
25 
ii ii 
z z 


•M- ^ 
? CM 
ID ID 

T-* (£> 

r- rr 
CO CO 

rf ID 
W CM 

+ + 
25 
ii ii 
ZZ 


Scanned by CamScanner 






APPENDIX 


Degrees of 

freedom 


Table II. Critical values of F. a = 0.05 (roman type) and 0.01 (bold-face type) 

Degrees of freedom for greater mean square 


mean 
mare 1 

2 

3 

4 

5 

6 

7 

8 

9 

lol 

il 

12 

14 

16 

20 

24 

301 

40 

50 

75 l 

—r 

tod 

200 

5001 


1 

161 

4052 

200 

4999 

216 

5402 

225 

5635 

230 

5764 

234 

5859 

237 

5928 

239 

5981 

6022 

242 

6058 

243| 244 

6082 6106 

245 

6142 

246 

6169 

248 

6208 

249 

6234 

250 

625M i 

251 

6286 

252 

6302 

J 

6323 

■ff 

63341 

254 
6352 4 

254 

b W»l i 

254 

6366 

2 

18.51 

98.49 

19.00 

99.01 

19 16 
99.17 

19.25 

99.25 

19.30 

99.30 

19.33 

99.33 

19.36 

99.34 

19.37 

99.36 

19.3* 

99 J8 

19.39 19.40 19 41 

99.40j99.4l 99.J2 < 

1942 
^9.43 < 

1945 

*9.44 

1944 

*9 4 5 « 

19.45 1 

*9 46 « 

19.46 1 

>9 47 S 

9 47 ! 

'9.48 ! 

19.47 1 

*9 48 < 

19.48 1 

>9 49 <, 

19 49 | 

19 49 5 

949 1 

>9.49 9 

9 50 1 

►9.50 9 

950 

>9.50 

3 

10.13 

34.12 

9.55 

30.81 

9.28 

29.46 

9.12 

28.71 

9.01 

28.24 

8 94 

27.91 

8.88 

27.67 

8 84 

27.49 

8 81 8.78 8 76 8.74 
27..M 27.23 27.13 27.05 . 

8.71 8.69 
26.92 26.83 

8.66 

26.6^ 

8 64 

26 60 ; 

lb 50 : 

8.60 

!6.41 

8.58 
2635 ; 

8.57 

16.27 ; 

8.56 

i6.i'; 

8.54 
16 18 2 

8 54 

!6.i4; 

8.53 

16.12 

4 

7.71 

2120 

6.94 

18.00 

6.59 

16 69 

6.39 

15 98 

6.26 

15.52 

6.16 

15.21 

6.09 

14.98 

6.04 

14 86 

6 00 

1466 

5.96 

1454 

5.931 5.91 5.87 5 84 

14.45 14.37 14 24 14.15 

5.80 

1462 

5.77 

13.93 

5.74 

13.83 1 

5.71 

13.74 

5.70 

13.69 

5.68 

13.61 1 

5.66 
13.57 1 

5.65 
13.52 1 

5 64 
13.48 1 

5.63 

13.46 

5 

6.61 

16.26 

5.79 

13.27 

5.41 

1106 

5.19 

11.29 

505 

10.97 

4 95 

10.67 

4.88 

10 45 

4.82 

10 27 

4 78 

10.15 

4.74 

10.05 

£3 

4.681 

9.82 

aa 

460 

9.68 

4 56 
935 

4.53 

9.47 

4.50 

938 

4.46 

9.29 

4 44 
9.24 

4 42 
9.17 

4.40 

9.13 

4.38 

9.07 

4.37 

9.04 

4.36 

9.02 

6 

5.99 

13.74 

5.14 

10.92 

4.76 

9.78 

4.53 

9.15 

4.39 

8.75 

4.28 

8.47 

4.21 

8.24 

4.15 

&.10 

rij 

d 

403 

7.791 

ic.\ 

7.72 

1 Axl 

7.60 

3.92 

732 

3.87 

739 

3 84 
731 

3.81 

7.23 

3.77 

7.14 

3.75 

7.09 

3.72 

7.02 

3.71 

6.99 

3.69 

6.94 

3.68 

6.90 

3.67 

6X8 

7 

5.59 

1125 

4.74 

9.58 

4.35 

8.45 

4.12 

7.85 

3.97 

7.46 

3.87 

7.19 

7*0 

6 84 

d 

61 r 

3 60 

3.57 

6.47 

332 

635 

3 49 
637 

3.44 

6.15 

3.41 

6.07 

3.38 

5.98 

3.34 

5.90 

3.32 

5.85 

3.29 

5.78 

3.28 

5.75 

3.25 

5.70 

3.24 

5.67 

3.23 

5.65 

% 

5 . 3 ; 

11.21 

4.4C 
. 8.6! 

4.07 

7.89 

384 

7.01 

3 69 

6.63 

3.58 

6.37 

6.19 

*44 
it 5 

. . j 

3.39| 

5.91 

JJ4 

5 

3-311 

5.74 

< 4 ” I 
| 

554 

3 20 

548 

3.15 

534 

3.12 

5.28 

3 08 

5,20 

3.05 

5.11 

3.03 

5.06 

3.00 

5.00 

2.98 

4.96 

2.96 

4.91 

2.94 

4.88 

2.93 

8.84 

9 

5.1 

10.5 

4.2< 

8.0 

[> 3.86 
l 6.95 

3.63 

6.43 

3.48 

t.Ol 

3.37 

5.80 

562 

5-47 

5J5 

id 

5 18 

mJ 

5 00 

298 

492 

293 

480 

290 

4.73 

2.86 

4.64 

2.82 

4.56 

2.80 

4.51 

2.77 

4.45 

176 

4.41 

2.73 

4.36 

2.72 

433 

2.71 

431 

10 

4.9 

100 

6 4 1 
4 7.5 

0 3.7 

6 6.5! 

3.41 

5.95 

3.33 

5 04 

3.22 

5.39 

3.14 

3.21 

3j07 

5*4 

4.95 

J 

4X5>| 

2 941 
4.7i| 

191 

4.71 

2JO 

4 60 

182 

4 

4.41 

274 

433 

2.70 

4.25 

167 

4.17 

2.64 

4.12 

2.61 

4.05 

2.59 

4.01 

2.56 

3.96 

2.55 

3.93 

2.54 

3.91 

II 

4.3 

9.6; 

1 3.9 
5 7.2i 

B 3.5! 

0 6 . 2 ; 

3.34 
i 5.6; 

3 20 
r 5.37 

3 09 
5.07 

3 jOI 

4 M 

2 95 

4.74 

2*0 

AM 

ix*| 

itl] 

4 4 i 

4 40' 

2£ 

421 

165 

410 

2.61 

4.02 

2.57 

3.94 

2.53 

3.86 

2.50 

3.80 

2.47 

3.74 

2.45 

3.70 

2.42 

3.66 

2.41 

3.62 

2.40 

3.60 

12 

4.7! 

9.3. 

5 3.81 

1 6.9. 

B 3.4! 
J 5.9! 

) 3.2< 
> 5.41 

> 3.11 

1 5 0c 

3 00 

4.82 

2 92 

4.65 

JIJ 

4 50 

; M 

4 



4 U' 

415 

260 

3.91 

154 

3.14 

2.50 

3.78 

2.46 

3.70 

2.42 

3.61 

2.40 

3.56 

2.36 

3.48 

2.35 

3.46 

2.32 

3.41 

2.31 

33* 

2.30 

3.24 

13 

4 6; 
9,o; 

r 3.8( 

* 6.7( 

) 3.41 
) 5.74 

1 3. If 
1 5.2C 

i 3.07 

I 4 KC 

! 2.92 
> 4.67 

2 64 

4.44 

iS 

4.19 

n 

\\ 

H 

iS 

231 

3.78 

2.46 

3.67 

2.42 

3.59 

2.38 

3.51 

234 

3.42 

132 

3.37 

2.28 

3.36 

2.26 

3.27 

2.24 

3.21 

12 ; 

3,n 

\ 121 

1 3.16 

14 

4.60 

8.86 

3.74 

6.51 

3.34 

5.56 

I 3.11 
5.IJ 

2.96 

4.69 

i 185 

4.46 

4.28 

2.70 

4.14 

163 

4.03 

ijS| 

\ l u \ 

y u 

3.60 

246 

3.70 

244 

3.62 

139 

331 

135 

3,43 

2.31 

3.34 

127 

3.26 

2.24 

3.21 

2.21 

XU 

2. IS 
1 3.11 

> 2.16 

l 3.06 

2.14 

* 3 . 0 ; 

l 113 
l 3.00 

15 

4.54 

8.68 

3.68 

6.36 

3.29 

5.42 

3.06 

4.89 

190 

4.56 

179 

432 

170 

4.14 

164 

4 00 

•% 40 

3.89 

155 

JL» 

231 

X73 

148 

Jl67| 

243 

334 

139 

3.48 

133 

336 

2 29 

3.29 

2.25 

3.20 

121 

3.13 

2.If 

i 3 . 0 ; 

\ 2.1! 

7 3.0< 

> 2.i; 

) 19* 

> 2.IC 
7 19; 

) 2.0! 

1 28! 

* 207 

> 187 

16 

4.49 

8.53 

3.63 

6.23 

3.24 

5.29 

3.01 

4.77 

2.85 

4.44 

174 

4.20 

166 

4.03 

159 

3.89 

134 

3.78 

149 

3.69 

145 

3.61 

142 

335 

237 

3-45 

233 

337 

228 

3.25 

124 

3.15 

120 

3.10 

lu 

3.01 

) 11: 

l 2.9( 

1 2.0! 
b 18! 

> 2 . 0 ; 

) 18- 

7 2.04 

1 28C 

i 2 j; 
) LT 

2 2.01 

7 175 

17 

4.45 

8.40 

3.59 

6.11 

3.20 

5.18 

2.96 

4,67 

181 

4.34 

170 

4.10 

162 

3.93 

155 

3.79 

150 

3.68 

145 

359 

141 

332 

138 

3-45 

133 

335 

229 

3.27 

213 

3.14 

1 219 

► 3.08 

> 215 
1 3.00 

in 

19; 

1 2.01 

2 284 

1 2.04 

b 17! 

i io; 

) 174 

l 1.9S 

b 17C 

> 1 . 9 ; 

1 16' 

7 1.96 

7 265 

18 

4.41 

8.28 

3.55 

6.01 

3.16 

5.09 

2.93 

4.58 

177 

4.25 

166 

4.01 

158 

3.85 

151 

3.71 

146 

3.60 

141 

351 

237 

3.44 

134 

337 

229 

3.27 

225 

3.15 

21! 

3.0' 

> 115 
J 3.0C 

i 111 
1 291 

2.0' 

28. 

7 2.0^ 

3 271 

1 2 CK 
i 171 

) 1.91 

1 261 

l 1.9! 

3 io; 

i 1.9: 
! 2-5! 

I 1.92 
> 157 

19 

4.38 

8.18 

3.52 

5.93 

3.13 

5.01 

190 

4.50 

2.74 

4.17 

163 

3.94 

155 

3.77 

14S 

3.63 

143 

352 

138 

3.43 

234 

336 

131 

330 

126 

3.19 

221 

3.12 

21! 

! 3.(M 

5 111 

) 29; 

1 107 

i 284 

io; 

171 

l 2.<X 

b 274 

) 1.96 

) 24 

b 1* 

3 164 

1 1.91 
) 154 

I.9C 
1 151 

) IXS 

l 149 

20 

4.35 

8.10 

3.49 

5.85 

3.10 

4.94 

187 

4.43 

171 

4.10 

160 

3.87 

252 

3.71 

145 

3.56 

140 

3.45 

135 

337 

231 

330 

128 

3.23 

123 

3.13 

111 

3.0: 

i n; 

> 29- 

l 2jOC 

1 18< 

l 2.04 

> 177 

I.9 4 

26< 

) l.9< 
) 26. 

b 1.9: 

3 154 

l I.9C 
b 15: 

) 1.87 

3 14; 

7 IX! 
r 144 

1 1.84 

1 142 

21 

4.32 

8.02 

3.47 

5.78 

3.07 

4.87 

2.S4 

4.37 

168 

4.04 

157 

3.81 

2.49 

3.65 

142 

3.51 

137 

3.40 

132 

331 

128 

3.24 

125 

3.17 

120 

3.07 

11! 

2* 

> 10! 

) 181 

) 2.0! 

i 18( 

i 100 

1 172 

1 9( 
16. 

b 1.9: 
3 151 

J 1.8! 
5 151 

> ix; 
i 24 ; 

7 1.84 
7 14; 

i i s: 

i 23f 

! 1X1 
236 

22 

4.30 

7.94 

3.44 

5.72 

3.05 

4.82 

182 

431 

2.66 

3.99 

155 

3.76 

147 

3.59 

140 

3.45 

135 

335 

130 

3.26 

2.26 

3.18 

123 

3.12 

218 

3.02 

112 

294 

3 20' 

i 18; 

7 10 j 

5 17! 

I 1.98 
> 267 

1 . 9 : 

15! 

3 1.91 

8 15. 

i ix; 

3 144 

7 1.8- 

b 24; 

1 1X1 
l 137 

l.SC 

23J 

1.78 

231 

23 

4.28 

7.88 

3.42 

5.66 

3.03 

4.76 

2.80 

4.26 

2.64 

3.94 

153 

3.71 

145 

3.54 

2.38 

3.41 

232 

330 

2.28 

3.21 

124 

3.14 

120 

3.07 

214 

297 

1IC 

2.8! 

) 10- 

> 171 

1 2.0C 
5 17C 

) 1.96 

1 262 

) 19 
i 15. 

1 1.8! 
3 14. 

b 1.8- 

3 241 

i ix; 
i 2j; 

l 179 

7 3-32 

1.77 

22* 

1.76 

126 

24 

4.26 

7.82 

3.40 

5.61 

3.01 

4.72 

178 

4.22 

162 

3.90 

2.51 

3.67 

2.43 

3.50 

2.36 

336 

130 
3 05 

2.26 

3.17 

122 

3.09 

218 

3.03 

113 

192 

10 ! 

18! 

> io: 

? 27. 

l 1.98 

4 164 

} 1.94 

> 158 

1 1.8< 

; 14! 

3 1.84 

3 12 

b ix; 

1 23- 

l 1 sc 

1 23J 

1 1.76 
127 

1.74 

22J 

1.73 

8-21 

25 

4.24 

7.77 

3.38 

557 

2.99 

4.68 

2.76 

4.18 

160 

3.86 

149 

163 

141 

3.44 

134 

332 

2.28 

3.21 

124 

3.13 

120 

3.05 

216 

190 

111 

28S 

2« 
► 181 

i 2.0< 

l 271 

) \M 
) 16; 

> 1.92 
! 154 

L 18 

1 14! 

7 IX- 

5 144 

$ l.SC 

) 23 ; 

) 1.77 

! 22! 

1.74 
2 2} 

1.72 

119 

& 

26 

4.22 

7.72 

3.37 

6.53 

2.98 

4.64 

174 

4.14 

159 

3.82 

2.47 

3.59 

139 

3.45 

232 

3.29 

127 

3.17 

122 

3.09 

118 

3.02 

215 

196 

11C 

18C 

I 20! 

J IT 

b 1.9! 
7 161 

} 1.9! 
0 251 

» I.9C 
i 150 

1.8! 

24 

b 1.8: 

l 134 

» 175 

b 225 

I 1.71 
U 22 

1.72 

119 

1.70 

215 

lit 

i.U 


r‘*uicu oy permission irom statistical mciuuua uy 
6th (c) 1967, by the Iowa State University Press, Ames Iowa. 


Scanned by CamScanner 
































































384 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Degrees of 
freedom 


Table H. Critical values of F (continued). 


JW , ICOJCT f 

mean “T 
square 1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

n 

12 

14 

16 

20 

24 

30 

40 

50 

75 

100 

200 

500 

Oo 

27 

4.21 

3.35 

2.96 

2.73 

2.57 

2.46 

2.37 

2.30 

2.25 

2.20 

2.16 

2.13 

2.08 

<1 Ol 

2.03 

1.97 

2.68 

1.93 

2.55 

1.88 

2.47 

1.84 

238 

1.80 

2.33 

1.76 

2.25 

1.74 

2.21 

1.71 

2 16 

1.68 

1.67 


7.68 

5.49 

4.60 

4.11 

3.79 

3.58 

3.39 

3.26 

3.14 

3.08 

2.98 

2.93 

2 .oJ 

/*♦ 




••ID 


*40 

28 

4.20 

7.64 

3.34 

6.45 

2.95 

4.57 

2.71 

4.07 

2.56 

3.76 

2.44 

3.53 

2.36 

336 

2.29 

3.23 

2.24 

3.11 

2.19 

3.03 

2.15 

2.95 

2.12 

2.90 

2.06 

2.80 

2.02 

2.71 

1.96 

2.60 

1.91 

2.52 

1.87 

2.44 

1.81 

235 

1.78 

2.30 

1.75 

2.22 

1.72 

2.16 

1.69 

2.13 

1.67 

2.09 

1.65 

2.04 

29 

4.18 

7.60 

3.33 

5.42 

2.93 

4.54 

2.70 

4.04 

2.54 

3.73 

2.43 

3.50 

2.35 

3.33 

2.28 

3.20 

2.22 

3.08 

2.18 

3.00 

2.14 

2.92 

2.10 

2.87 

1.05 

2.77 

2.00 

2.68 

1.94 

237 

1.90 

2.49 

1.85 

2.41 

1.80 

232 

1.77 

2.27 

1.73 

2.19 

1.71 

2.15 

1.68 

2.10 

1.65 

2.04 

1.64 

2.03 

30 

4.17 

7.56 

3 . 3 ? 

5.89 

-> O ') 

431 

2.69 

4.02 

2.56 

3.70 

2.42 

3.47 

2.34 

3.20 

2.27 

3.17 

2.21 

3.05 

2.16 

2.98 

2.12 

2.90 

2.09 

2.84 

2.04 

2-74 

1.99 

2.66 

1.93 

2.55 

1.89 

2.47 

1 . S 4 

238 

1.79 

2.29 

1.76 

2.24 

1.72 

2.16 

1.69 

2.13 

1.66 

2.07 

1.64 

2.03 

1.62 

2.01 

32 

4.15 

7.50 

3.30 

5.34 

2.90 

4.46 

2.67 

3.97 

2.51 

3.66 

2.40 

3.42 

2.32 

3.25 

2.25 

3.12 

2.19 

3.01 

2.14 

2.94 

2.10 

2.86 

2.07 

2.80 

2.02 

2.70 

1.97 

2.62 

1.91 

231 

1 . S 6 

2.42 

1.82 

234 

1.76 

2.25 

1.74 

2.20 

1.69 

2.12 

1.67 

2.08 

1.64 

2.02 

1.61 

1.98 

1.59 

1.96 

34 

4.13 

3.28 

2.88 

2.65 

2.49 

2.38 

2.30 

2.23 

2.17 

2.12 

2.08 

2.05 

2.00 

1.95 

1.89 

1.84 

1 . S 0 

1.74 

1.71 

1.67 

1.64 

1.61 

1.59 

1.57 


7.44 

5.29 

4.42 

3.98 

3.61 

3.38 

3.21 

3.08 

2.97 

2.89 

2.82 

2.76 

2.66 

238 

2.47 

238 

2.30 

2.21 

2.15 

2.03 

2.04 

1.98 

1.94 

1.91 

36 

4.11 

3.26 

2.86 

2.63 

2.48 

2.36 

2.28 

2.21 

2.15 

2.10 

2.06 

2.03 

1.98 

1.93 

1.87 

1.82 

1.78 

1.72 

1.69 

1.65 

1.62 

1.59 

1.56 

1.55 


7.39 

5.25 

4.33 

3.89 

3.58 

3.35 

3.18 

3.04 

2.94 

2.86 

2.73 

2.72 

2.62 

234 

2.42 

235 

2.26 

2 * 17 

2.12 

2.04 

2.00 

1.94 

1.90 

1.87 

38 

4.10 

3.25 

2.85 

2.62 

2.46 

2.35 

2.26 

2.19 

2.14 

2.09 

2.05 

2.02 

1.96 

1.92 

1.85 

1.80 

1.76 

1.71 

1.67 

1.63 

1.60 

1.57 

1.54 

1.53 


7.35 

5.21 

4.34 

3.86 

3.54 

3.32 

3.15 

3.02 

2.91 

2.82 

2.75 

2.69 

239 

231 

2.40 

232 

2.22 

2.14 

2.03 

2.00 

1.97 

1.90 

1.85 

1.84 

40 

4.08 

3.23 

2.84 

2.61 

2.45 

2.34 

2.25 

2.18 

2.12 

2.07 

2.04 

2.00 

1.95 

1.90 

1.84 

1.79 

1.74 

1.69 

1.66 

1.61 

1.59 

1.55 

1.53 

1.51 


7.31 

5.18 

4.31 

3.83 

3.51 

3.29 

3.12 

2.99 

2.88 

2.80 

2.73 

2.64 

234 

2.49 

2.27 

2.29 

2.20 

2.11 

2.05 

1.97 

1.94 

1.83 

1.54 

1.51 

42 

4.07 

3.22 

2.83 

2.59 

2.44 

2.32 

2.24 

2.17 

2.11 

2.06 

2.02 

1 99 

1 94 

1 89 

1.82 

1.78 

1.73 

1.68 

1.64 

1.60 

1.57 

1.54 

1.51 

1.49 


7.27 

6.15 

4.29 

3.80 

3.49 

3.26 

3.10 

2.96 

2.86 

2.77 

2.70 

2.64 

234 

2.46 

2.35 

2.26 

2.17 

2.06 

2.02 

1.94 

1.91 

1.85 

1.80 

1.78 

44 

4.06 

3.21 

2.82 

2.58 

2.43 

2.31 

2.23 

2.16 

2.10 

2.05 

2.01 

1 98 

1 92 

1 88 

1.81 

1.76 

1.72 

1.65 

1.63 

1.59 

1.56 

1.52 

1.50 

1.43 


7.24 

6.12 

4.26 

3.73 

3.46 

3.24 

3.07 

2.94 

2.84 

2.75 

2.68 

2.62 

232 

2.44 

232 

2.24 

2.15 

2.04 

2.00 

1.92 

1.88 

1.82 

1.76 

1.75 

46 

4.05 

3.20 

2.81 

2.57 

2.42 

2.30 

2.22 

2.14 

2.09 

2.04 

200 

1 97 

1.91 

1 87 

1 80 

1.75 

1.71 

1.65 

1.62 

1.57 

1.54 

1.51 

1.48 

1.46 


7.21 

5.10 

4.24 

3.76 

3.44 

3.22 

3.05 

2.92 

2.82 

2.73 

2.65 

2.60 

2-50 

2.42 

230 

2.22 

2.13 

2.04 

1.98 

1.90 

1.86 

1.80 

1.76 

1.72 

48 

4.04 

3.19 

2.80 

2.56 

2.41 

2.30 

2.21 

2.14 

2.08 

2.03 

1.99 

1.96 

l 90 

1 86 

1 79 

1.74 

1.70 

1.64 

1.61 

1.56 

1.53 

1.50 

1.47 

1.45 


7.19 

5.08 

4.22 

3.74 

3.42 

3.20 

3.04 

2.90 

2.80 

2.71 

2.64 

2-58 

2.48 

2.40 

2.28 

2.20 

2.11 

2.02 

1.96 

1.88 

1.84 

1.78 

1.72 

1.70 

50 

4.03 

3.18 

2.79 

2.56 

2.40 

2.29 

2.20 

2.13 

2.07 

2.02 

1.98 

1.95 

1.90 


1.78 

1.74 

1.69 

1.63 

1.60 

135 

1.52 

1.48 

1.46 

1.44 


7.17 

5.06 

4.20 

3.72 

3.41 

3.18 

3.02 

2.88 

2.78 

2.70 

2.62 

236 

2.46 

£39 

2.26 

2.18 

2.10 

2.00 

1.94 

1.86 

1.82 

1.76 

1.71 

1.65 

55 

4.02 

3.17 

2.78 

2.54 

2.38 

2.27 

2.18 

2.11 

2.05 

2.00 

1.97 

1.93 

1.88 

1.83 

1.76 

1.72 

1.67 

1.61 

1.58 

1.52 

1.50 

1.46 

1.43 

1.41 


7.12 

5.01 

4.16 

3.68 

3.37 

3.15 

2.98 

2.85 

2.75 

2.66 

2.59 

233 

2.43 

235 

2.23 

2.15 

2.06 

1.96 

1.90 

1.82 

1.78 

1.71 

1.66 

1.64 

60 

4.00 

3.15 

2.76 

2.52 

2.37 

2.25 

2.17 

2.10 

2.04 

1.99 

1.95 

1.92 

1.86 

1.81 

1.75 

1.70 

1.65 

139 

1.56 

130 

1.48 

1.44 

1.41 

1.39 


7.08 

4.98 

4.13 

3.65 

334 

3.12 

2.95 

2.82 

2.72 

2.63 

2.56 

230 

2.40 

232 

2.20 

2.12 

2.03 

1.93 

1.87 

1.79 

1.74 

1.68 

1.63 

1.60 

65 

3.99 

3.14 

2.75 

2.51 

2.36 

2.24 

2.15 

2.08 

2.02 

1.98 

1.94 

1.90 

1.85 

1.80 

1.73 

1.68 

1.63 

137 

1 54 

1.49 

1.46 

1.42 

1.39 

1.37 


7.04 

4.95 

4.10 

3.62 

331 

3.09 

2.93 

2.79 

2.70 

2.61 

2.54 

2.47 

237 

230 

2.18 

2.09 

2.00 

1.90 

1.84 

1.76 

1.71 

1.64 

1.60 

1.56 

70 

3.98 

3.13 

2.74 

2.50 

2.35 

2.23 

2.14 

2.07 

2.01 

1.97 

1.93 

1.89 

1.84 

1.79 

1.72 

1.67 

1.62 

136 

1.53 

1.47 

1 45 

1 40 

1.37 

1.35 


7.01 

4.92 

4.08 

3.60 

3.29 

3.07 

2.91 

2.77 

2.67 

259 

2.51 

2.45 

235 

2.28 

2.15 

2.07 

1.98 

1.88 

1.82 

1.74 

1.69 

1.62 

1.56 

> 1.53 

80 

3.96 

6.96 

3.11 

4.33 

2.72 

4.04 

2.48 

3.56 

2.33 

3.25 

2.21 

3.04 

2.12 

2.67 

2.05 

2.74 

1.99 

2.64 

1.95 

255 

1.91 

2.48 

1.88 

2.41 

1.82 

232 

1.77 

2.24 

1.70 

2.11 

1.65 

2.03 

1.60 

1.94 

1.54 

1.84 

1.51 

1.78 

1.45 

1.70 

1.42 

1.65 

1.38 

137 

1.35 

1.52 

1 1.32 

: 1.49 

100 

3.94 

6 . 9 C 

3.09 

4.82 

2.70 

3.98 

2.46 

3.51 

2.30 

3.20 

2.19 

2.98 

2.10 

2.82 

. 2.03 

2.69 

1.97 

2.59 

1.92 

2.51 

1.88 

2.43 

1.85 

236 

1.79 

2.26 

1.75 

2.19 

1.68 

2-06 

1.63 

1.98 

1.57 

1.89 

131 

1.79 

1.48 

1.73 

1.42 

1.64 

1.39 

139 

1.34 

131 

1 . 3 C 

1.45 

1 1.28 
; 1.43 

125 

3.92 

6.84 

3.07 

4.78 

2.68 

3.94 

2.44 

3.47 

2.29 

3.17 

2.17 

2.95 

2.08 

2.79 

2.01 

2.65 

1.95 

2.56 

1.90 

2.47 

1.86 

2.40 

1.83 

233 

1.77 

2.23 

1.72 

2.15 

1.65 

2.03 

1.60 

1.94 

1.55 

1.85 

1.49 

1.75 

1.45 

1.68 

1.39 

1.59 

1.36 

134 

1.31 

1.46 

1.27 

1.40 

' 1.25 

137 

150 

3.91 

6.81 

3.06 

L 4.75 

2.67 

3.91 

2.43 

3.44 

2.27 

3.14 

2.16 

2.92 

2.07 

: 2.76 

2.00 

2.62 

1.94 

2.53 

1.89 

2.44 

1.85 

2.37 

1.82 

230 

1.76 

2.20 

1.71 

2.12 

1.64 

2.00 

1.59 

1.91 

1.54 

1.83 

1.47 

1.72 

1.44 

1.68 

1.37 

1.56 

1.34 

131 

1.29 

1.42 

1.25 

137 

1.22 

133 

200 

3 . 8 < 

6 . 7 < 

) 3.04 

> 4.71 

2.65 

3.58 

2.41 

3.41 

2.26 

3.11 

i 2.14 
2 . 9 C 

1 2.05 

> 2.73 

1.98 

2.60 

1.92 
» 2.50 

1.87 

► 2.41 

1.83 

2.34 

1.80 

2.28 

1.74 

2.1? 

1.69 

2.09 

1.62 

1.97 

137 

1.88 

1.52 

1.79 

1.45 

1.69 

1.42 

1.62 

1.35 

133 

1.32 

1.48 

1.26 

139 

1.22 

132 

1.19 

1.30 

400 

3 . 8 < 

) 3.02 

1 2.62 

2.39 

1 2.23 

1 2.12 

1 2.03 

1 1.96 

• 1 . 9 C 

1 1.85 

1 81 

1 *70 

1.72 

2.12 

1.67 

2.04 








1.22 

132 

1.16 

1.24 

1 13 


6 . 7 ( 

) 4.66 

» 3.83 

i 3.36 

► 3.06 

► 2.85 

J 2.69 

► 2.55 

1 2.46 

► 2.37 

’ 2.29 

l./o 
> 2-23 

1.60 

1.92 

134 

1.84 

1.49 

1.74 

1.42 

1.64 

1.38 

137 

1.32 

1.47 

1.28 

1.42 

1 . 1 * 

1000 

' 3 . 8 ! 

6.61 

5 3 . 0 C 

i 4 . 6 ] 

) 2.61 
l 3 . 8 C 

2 . 3 * 

I 3.34 

l 2 . 2 ] 

1 3.0 A 

\ 2 . 1 C 

i 2 . 8 : 

) 2 . 0 ] 

l 2.66 

l 1.95 

i 2.53 

i 1 . 9 * 
1 2.42 

l 1.84 
1 2.34 

■ 1 . 8 C 

1 2.26 

1 1.76 
> 2.20 

1.70 

2.09 

1.65 

2.01 

1.58 

1.89 

133 

1.81 

1.47 

1.71 

1.41 

1.61 

1 36 

134 

1.30 

1.44 

1 26 

138 

1.19 

1.28 

1.13 

1 08 
1.11 

ee 

, 3 . 8 - 

, 6 . 6 - 

a 

4 2 . 9 < 

4 4 . 6 ( 

> 2 . 6 C 

) 3 . 7 i 

) 2 . 3 ' 
1 3 . 3 : 

1 2.21 

! 3 . 0 ; 

1 2.05 

l 2 . 8 < 

) 2.01 

) 2 . 6 - 

1 1.94 

1 2.31 

1 1 . 8 * 
l 2.41 

5 1.83 

l 2 . 3 ] 

1 1.79 
! 2 .M 

> 1.75 
l 2.18 

1.6 

2.0 

1.64 

1.99 

1.57 

1.87 

132 

1.79 

1.46 

1.69 

1.40 

1.62 

1 35 
132 

1.28 

1.41 

1.24 

134 

1-17 

1-25 

I.U 

1.15 

i.oo 

1 .W 


Scanned by CamScanner 






































INDEX 


a-posteriori tests 301-302 
a priori tests 300-302.304-305.311 
acceptance region 113-115 

age scores 272 « 9 _ 2 60 

alternate-forms method I 259 2 5 

alternative hypothesis (WJ 
analysis of variance 2B/-M 
assumptions 289-29 
models 288-289, 294. 299-300. 326 
nonparametric 310-315, 340 
one-way anova 288, 291-300, 303-309, 326, 341 
two-way anova 288, 315-326 
assumptions 118 


bar diagram 20-22 

Bernoulli distribution 101, 104-105 

Binet-Simon scale 272 

binomial distribution 100-105, 107, 216, 221 
biserial r 183-186, 254 
bivariate distribution 156 
bivariate statistics 144 
Bonferroni method 301, 305 


C scale 275-276 

central moments 67 

chi square tests 208-221, 335 

classification variables 5, 246, 281-282, 289, 291 

clumped distribution 71 

coefficient 

of dispersion 71, 102-103, 106-107 
of equivalence 259-260 
of internal consistency 261 
of quartile deviation 71 
of reliability 257 
of stability 259 

of stability and equivalence 260 
of validity 264 
of variation 70 
composite rank test 231-237 
concurrent validity 267 
confidence interval 97, 99-100 
construct validity 266 
c °ntent validity 265-266 
contingency coefficient 191-193 


:onvergent validity 266, 268 
correlation 12, 144-145 

linear 145-162, 287, 336-337 

multiple 164-167, 287 

nonparametric 167-178. 187-189, 191-193 

partial 162-166, 287 
table 156-159, 268 
criterion-related validity 266-267 
critical region 113, 115-117 
critical scores 88-89, 97, 


i h i i a 1 71- 124. 126 


decile 47 

degrees of freedom 82, 123 
dependent variables 4-5, 247, 281-285, 287-288 
descriptive statistics 11-12 
deviation IQ 277-278 
difference between means 110-111 
standard error 74, 78-79 
tests for significance 111, 117-143 
difficulty value 249-253 
discriminant validity 266, 268 
discriminatory value 249, 253-256 

equivalent-forms reliability 259-260 
equivalent groups 139-140, 284 
error 

of inference 114-115 
of measurement 257 
of prediction 196 
error variance 292, 317 
expectancy table 268 
experimental design 281-288 
exponential curve 37-38 


F ratio 166-167, 288, 291-294, 296, 298-305, 307- 
309, 317-319, 321-323, 325, 327, 341 
F test, Schcffe’s 301-302, 304 
factor analysis 268-270 
factor loading 270 
factor theory 269-270 
factor validity 270 
factorial experiment 285 


385 


Scanned by CamScanner 


386 


STATISTICS IN BIOLOGY AND PSYCHOLOGY 


Fisher’s z 147-148’. 155 
frequency 

cumulative 30 

distribution 15-19. 22-30. 156 
curve 30 
polygon 22-26 


norm 270-278 

normal distribution 86-92, 98 
best-fitting 89-92 
normalized standard scores 273 
null hypothesis (H 0 ) 111-116 


G test 222-227 
Gabriel’s SS-STP 302 
geometric mean 45-46 

histogram 26-30 


ogive 30-33 

omega square 294, 299-300. 305. 341 
one-tail test 88 , 116-117. 124-125. 128-130. 132 137 
*139. 142-143 

one-way anova *28S. 291-300. 303-309, 326, 341 


homoscedasticity 118 , 123,290 

independent group experiment 118-132. 283-284. 330- 
331, 333 

independent variables 5-6, 247-248. 281-285 

index of discrimination 254-256 

interval scale 3 

intervening variables 249 

intrinsic validity 264 ’ 

item analysis 249-256 

item-criterion correlation 184. 187, 189. 254 

item-item correlation 187, 254 

item-total correlation 179, 184.253-254 

item validity 253 

.-Kendall’s tau 172-178, 287 
Kuder-Richardson formulae 261-263 
Kruskal-Wallis test 310-315; 340 
kurtosis 94-95 


parallel-forms method 259-260 
parameter 11 . 
partial correlation 162-164 
part-whole correlation 25S 
percentile 46-48 
rank 48-49 
scale 271-272 

phi coefficient 187-189. 254-256 
pie diagram 20 

pomi-biscrial r 179-183.253-254 
foivson distribution 105-108.216 
population 6-7 
power test 250 
prediction statistics 12, 193. 
predictive validity 266-267 
probability 84-85 
distributions 85-108 

pfoducomonurn,, 133. 139-142, 145-162,287.329 


linear graph 34-37 

linear transformation SO-81, 271 

Mann-Whitney U test 237-241,311-313.334 340 

matched-pair groups 139-140,284 

mean 41-45 

mean deviation 57 ' 

median 49-53 

median test 242-245 

mental uge 272 

mode 53-55 

multiple comparison 300-302, 304-305 
multiple correlation 164-167, 287 
multivariate statistics 144,162,164 

nonlinear transformations 81 , 271 , 290 


prospective method 282-283 

quiuitiles 41, 46-48 
quartile 47, 67-71 
deviation 67-71 


^• 09-285 


randomized block 
ran ge 56-57 
ratio IQ 272-273 
ratio scale 3 

rational equivalence 261 

regression 12, 193-lQs 
models 194-195 
multiple 194,203-206 

nyecLT 195 - 203 - 33 <W39 

rejection region 113 lie.,,-, 
relevant validity 264 


Scanned by CamScanner 











INDEX 


387 


relevant variables 6, 248, 281-283, 287 
reliability 249, 257-263, 267-268, 287-288 

repulsed distribution 71 
retrospective method 281-282 


sum of squares (contd.) 
residual 317, 319 
total 291, 315-316 
within-groups 292 


sample 7 
size -285 

sampling 7-11.286-287 _ 

sampling distributions 73-74 
sampling errors 73, 1110 
sampling statistics 12, 11-** 
scattergram 33-34 
semi-logarithmic graph 38-39 
signed rank test 227-231 
significance level 112-114 

significance test 110-111, 117-143, 329-334 
single-group experiment 132-143, 284, 332 


skewness 92-94 

Spearman-Brown formulae 260-261 
Spearman’s rho 167-171,287 


speed test 250 
split-half reliability 260-261 
standard deviation 57-64, 70 
standard error 74-80 

of differences 74, 78-80, 110 
of estimate 196, 201, 204-206, 264 
of mean 74-79 


of measurement 258 
of proportion 78-80 
standard score 80-81, 273 
Stanford-Binet scale 272-273, 278 


stanine 276-277 


statistics 1 

of dispersion 12, 50-72 
of location 11-12,41-55 
nonparametric 208 
prediction 12 
sampling 12, 73-83 
sum of squares 

between-columns 316, 318 
between-groups 291 
between-rows 316, 318 
critical 302 
interaction 316-317 

partitioning 291-292, 294-297, 299-300, 303-307, 
309, 315-319, 341 


t distribution 95-98 
T scale -273-275 

/ score 95-100, 123-143, 301, 304-305, 329-332, 336- 
337 

T statistic 228-229,231-237 
t tests 123-143, 301, 304-305, 329-332, 336-337 
multiple comparison 301,304-305 
tesf construction 249-250 
test-retest reliability 259 
tetrachoric r 1S9-191, 254 

treatment variables 5, 24S, 2S1-282, 287-289, 294, 
297. 303, 305 

truncated distributions 1S4 
two-tail test 8S-89, 115-122, 124, 126-129, 131-137, 
140-142 

two-way anova 288,315-326 
unit normal curve 86-92 


validity 249, 263-268, 287-288 
variables 2-6, 247-249, 281-285, 287 
variance 64-67, 71 

added treatment component 289 

added variance component 289, 294, 296, 308 

between-columns 316, 318 

between-groups 292 

between-rows 3J6, 318 

error 292, 317 

interaction 316-317, 319 

ratio (see F ratio) 

residual 317 

total 292, 316 

within-cells 317 

within-groups 292 


Wechsler intelligence scale 277 
Wilcoxon’s rank tests 227-237, 287 


correction 210, 214-216, 222, 225, 227 

■ 80-81,86-92,97-100, 117-123. 185-186.238, 
240-241,313-314, 333-334 


Scanned by CamScanner 


